Knowledge is Here...: 2016

Thursday, August 4, 2016

Java HotSpot(TM) Client VM warning: Cannot open file /cassandra37/logs/gc.log due to No such file or directory

I have been started working with cassandra, cause want to create connection with Pentaho tool, since it is having cassandra inputs. So here while working with Cassandra i faced error, it blocking for to work with it. I have looked the errors and find below error in error manager.

Error
Java HotSpot(TM) Client VM warning: Cannot open file /cassandra37/logs/gc.log due to No such file or directory

Problem
As log clearly tell us "Log directory not exists", so this could be a problem

Solution
Create Log Directory at specified path

Problem
Error occurred during initialization of VM
Could not reserve enough space for 2097152KB object heap

No single argument constructor found for class [Ljava.lang.String;; in 'reader', line 10, column 1:

While setting up of cassandra you might face below error. I faced below error, i did workaround and identified the solution of same. I am going to provide the solutions to all of you guys, to get you out from the any severe problem.

No single argument constructor found for class [Ljava.lang.String;; in 'reader', line 10, column 1:
cluster_name: 'Test Cluster'

Problem:
While modification of cassandra.yaml file you might change the format of given directories

Solution:
Change directories path format as given below (it should be exactly same)

data_file_directories:
    - /cassandra/storage/data
commitlog_directory: /cassandra/storage/commitlog
saved_caches_directory: /cassandra/storage/saved_caches

Workaround
In my case i have written data_file_directories as below

data_file_directories: /cassandra/storage/data

When i change path as per given format, then it works fine for me.

Wednesday, August 3, 2016

WARNING! Powershell script execution unavailable.

PowerShell have four type of execution modes:
1. Restricted
2. All Signed
3. Remote Signed
4. Unrestricted

We need to set any of the one execution mode, please follow below steps, for the solution of same.

Error

WARNING! Powershell script execution unavailable.
   Please use 'powershell Set-ExecutionPolicy Unrestricted'
   on this user-account to run cassandra with fully featured
   functionality on this platform.

Problem
PowerShell have four type of execution modes (Restricted, All Signed, Remote Signed, Unrestricted).
Currently your's machine Powershell Execution mode is set to Restricted.

Solution
Change powershell execution mode for this first open PowerShell with administrative rights "Run as Administrator".

Set-ExecutionPolicy Restricted

Check
Check status of execution mode of Powershell using below command

Unrestricted" goto runPowerShell

Friday, July 29, 2016

How to load data in hadoop?

Hadoop is know as to handle big data, i have been trying to setup connection with Pentaho and Hadoop. So first of all i need to install/setup hadoop (i did this on Red Hat Linux), and then it must contain a table which can hold some data. so in order to load data i used Hive to migrated data in Hadoop tables.

We used HIVE to create table and load data. I am using some pre-downloaded files, after search a lot on google. As all the files have data in tab separated form.

Create table from HIVE
Move to Hadoop_Home and where we need to move the files from root directory to specified directory
Move to HIVE shell
and using Load command, load data into tables.

create table users (id STRING, birth_date STRING, gender STRING) ROW FORMAT DELIMITED FIELDS TERMINATED by '\t' stored as textfile tblproperties ("skip.header.line.count"="1");

create table products (url STRING, category STRING) ROW FORMAT DELIMITED FIELDS TERMINATED by '\t' stored as textfile tblproperties ("skip.header.line.count"="1");

hadoop fs -put /urlmap.tsv /user/hadoop/products.tsv
hadoop fs -put /regusers.tsv /user/hadoop/users.tsv

hive>
LOAD DATA INPATH '/user/hadoop/products.tsv' OVERWRITE INTO TABLE products;
LOAD DATA INPATH '/user/hadoop/users.tsv' OVERWRITE INTO TABLE users;

Wednesday, July 27, 2016

How to clear cache of run (saved password of shared folder)

We generally used RUN command of window to work quickly or open different type of windows shortly (short hand of computer) using commands. We also need help of RUN command to open Network or shared folders if your machine is available in network.

After long time use of RUN command, it save all commands (as history) and it will be available to you for next time. If you do not want to see the history you need to below steps periodically.

Here i am going to tell the steps to delete specific IP address entry from history of RUN command.

Follow below steps to resolve the problem:

Open cmd
Execute command "net use"
Copy string from Remote column, IP address of server whose cache you want to delete
Execute below command

net use \\192.168.1.1\folder(paste your string copied in step 3) /delete

Now on re-execution of net use command, you can see the string entry get removed
Now go to task manager

kill explorer.exe and starting it again.

Tuesday, July 19, 2016

SecondaryNameNode Inconsistent checkpoint fields

While setting up Hadoop on linux server, i have faced below error consistently in log files, it took long to resolve for me, let head sup and re start you hadoop server.

You can locate below similar error in log files.

2016-07-19 12:43:28,702 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint
java.io.IOException: Inconsistent checkpoint fields.
LV = -60 namespaceID = 1655575768 cTime = 0 ; clusterId = CID-90fb1076-4e1e-4ab6-bb76-37177e31ad64 ; blockpoolId = BP-532229130-127.0.0.1-1468908471777.
Expecting respectively: -60; 724011492; 0; CID-f9fc0705-4a77-49d1-9220-874fd5f30efe; BP-78957080-127.0.0.1-1456924944705.
at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:134)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:531)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:395)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:361)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:357)
at java.lang.Thread.run(Thread.java:745)

SOLUTION

Stop all hadoop services (stop-all.sh)
Open logs hadoop-root-secondarynamenode-localhost***.log
Look for path of secondary name node
delete secondary namenode directory
start hadoop services

Hadoop datanode/namenode error

java.io.IOException: Incompatible clusterIDs in /home/hadoop/hadoopdata/hdfs/datanode: namenode clusterID = datanode clusterID =
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:646)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:320)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:403)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:422)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1311)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1276)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:314)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:828)
at java.lang.Thread.run(Thread.java:745);

SOLUTION

Check your hdfs-site.xml file to see where dfs.data.dir is pointing to
and delete folder
and then stop and start the datanode

Check you all log files for any error!!!!!!!!!

There might be chances to get below error

org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /path/to/hadoop/storage/namenode is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:313)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:202)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1020)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:739)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:536)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:595)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:762)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:746)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1438)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)
2016-07-19 11:33:16,737 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50070
2016-07-19 11:33:16,838 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2016-07-19 11:33:16,839 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2016-07-19 11:33:16,839 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2016-07-19 11:33:16,839 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.

PROBLEM
this is because above you have deleted both namenode and datanode folders:

SOLUTION
need to format the namenode

Steps to follow:

stop datanode
delete folder
execute "hdfs namenode -format"
start datanode

Thursday, July 14, 2016

Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

ERROR
hive>show databases;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

SOLUTION

chown -R hdfs:hadoop *

in case above solution doesn't work then try to remove *.lck files, using below command

[root@localhost sbin]# rm metastore_db/*.lck

RuntimeException org.apache.hadoop.security.AccessControlException

ERROR
RuntimeException org.apache.hadoop.security.AccessControlException: Permission denied: user=sa, access=WRITE, inode="/tmp/hive-root":root:supergroup:drwxr-xr-x

SOLUTION
Problem solved by doing change in dfs_permissions. Locate the file hdfs-site.xml and using vim editor change the value property to false.

 <property>
  <name>dfs.permissions</name>
  <value>false</value>
</property>

Wednesday, July 6, 2016

The system cannot find the batch label specified - nodemanager

Seems to be you are trying to start yarn or other on Windows and if you downloaded files from internet.
If this is the case then the files you have downloaded or replaced in directory "bin" and "etc/hadoop"

PROBLEM
Some of the files *.cmd having LF as line terminator (use notepad ++ to see)

SOLUTION
Open notepadd++ tool and convert file type as below

Goto> Edit> EOL Conversion> Windows Format

By doing this you can see the changes in file all new line terminator has been changed to CRLF.

Saturday, June 11, 2016

Want to save kettle Logging in database?

However, when you execute job using spoon it produce the logs in UI, the same logs you can also write in database. Here is article how we can write the logs generated during execution in the database. See below:

We can easily save the logs generated by the kettle in database tables. Easiest ways is to change the properties of the job.

Create table in database level. I am using SQLServer in my case and field used in below table are required field.
We have to consider all field while creation of table.

Create table logging
(
id_job int
, channel_id varchar(100)
, jobname varchar(100)
, status varchar(100)
, lines_Read int
, lines_written int
, lines_updated int
, errors int
, startdate datetime
, enddate datetime
, logdate datetime
, depdate datetime
, replaydate datetime
, log_field varchar(max)
);

Now create a job in kettle.
Double click on job (open area) - Job Properties - Log (tab)

Remember that the number of field selected using checkbox in Log_Table_Fields dialogue box, should exists in tables.
In the add-on you can create n number of fields (depend upon the limit of table and database) in same table

Tuesday, June 7, 2016

database url and driver

JDBC drive used to connect between application or database or in many cases as well. such as Any third party tool or application where you want to pull out data. To do this we used JDBC URL/ class paths.

Every server have their own JDBC drivers and url, class paths. URL you need to put in you application where connections needs to define. Also every server have their own jdbc***.jar files, you need to download and place these jar files in lib folder of repository folder to make connection happen and live. So that you can read, edit, delete, the data in database.

sqlserver
database url
jdbc:sqlserver://192.168.1.1:1433;databaseName=XYZ-DatabaseName

spago bi
url - jdbc:sqlserver://192.168.1.1
driver - com.microsoft.sqlserver.jdbc.SQLServerDriver

jtds driver
database url - jdbc:jtds:sqlserver://192.168.1.1:1433/XYZ-DatabaseName

oracle
url - jdbc:oracle:thin:@localhost:1531:XYZ-DatabaseName
driver - oracle.jdbc.driver.OracleDriver

Sunday, June 5, 2016

Playing With SQL Quries

Do you want to play with SQL queries, what a disgusting question, no this is not disgusting, yes you can play, even create games using various codes. Lets create a query to create a triangle. Below SQL query is prepare in Oracle SQL, it can clear your logical skills and usage of SQL function at right place and at right time.

SQL> select lpad(' ',8-level, ' ') ||rpad('*',level-2, '*')
  2  || rpad('*',level-1, '*')
  3   as  a
  4  from dual connect by level < 8
  5  order by level
  6  /

A
---------------

      *
     ***
    *****
   *******
  *********
 ***********

7 rows selected.

Monday, May 30, 2016

Create Folder/Subfolder in transformation java script

This Article related to Pentaho Kettle ETL tool, where we have requirement to create folder, sub folder in physical file directory dynamically using java script. Please follow below steps, you will be definitely able to do this.

We have step in job, using which we can create folder ans sub-folders.

Magically we have "Modified Java Script Value" step in transformation, using which we can also create folder and Subfolders in filesystem.

Steps :

Create new transformation and save it
Add - "Table Input" - "Select Values" - "Modified Java Script Value" in transformation
Table Input step contains - "select 'temp_folder' as path"
Select Values Contains entry of field Path
Modified Java Script Value contains - createFolder('e:/'+path);
Now save and execute it

Sunday, May 8, 2016

Get the list of all tables using JDBC driver using kettle

This is very very beautiful article for me, i bet you guys also enjoy this. The thing is i need information of all tables and columns and keys available in some database and i need all this information at one place and in some different server or some different database.

I searched on google and thanks god i am able to catch up a transformation, which will provide you list of tables, columns, data type of fields, available primary_keys , foreign keys and indexes available with in the database

We can easily get all these information using JDBC connection

Download the transformation from below link

http://wiki.pentaho.com/display/EAI/JDBC+Metadata

List and used function:

data types output - function used getTypeInfo()
tables output - function used getTables()
columns output - function used getTables()
primary key output - function used getPrimaryKeys()
imported key output - function used getImportedKeys()
index info output - function used getIndexInfo()

I have modified this transformation since i need information related to tables and columns only, and end result will be saved in tables of different database

Saturday, May 7, 2016

Define Variable for Database Connection in kettle

Instead of using static Database connection name in spoon UI, We can define database connection variable using kettle.properties file, where we can assign dynamic connection which will be available in through out
the kettle JVM space. Once you need to update the ketlle.properties file to update the connection name or server name, in that case you definitely need to restart your spoon or server. If you do not perform this action then variable contains will contains old connection detail thought the project.

To define variables we need to make change in two files
1. kettle.properties
2. shared.xml

Content of shared.xml

<?xml version="1.0" encoding="UTF-8"?>
<sharedobjects>
  <connection>
    <name>TestConnection</name>
    <server>${DBHostname}</server>
    <type>MSSQL</type>
    <access>Native</access>
    <database>${DBName}</database>
    <port>1433</port>
    <username>${DBUsername}</username>
    <password>${DBPassword}</password>
    <servername/>
    <data_tablespace/>
    <index_tablespace/>
    <attributes>
      <attribute><code>FORCE_IDENTIFIERS_TO_LOWERCASE</code><attribute>N</attribute></attribute>
      <attribute><code>FORCE_IDENTIFIERS_TO_UPPERCASE</code><attribute>N</attribute></attribute>
      <attribute><code>IS_CLUSTERED</code><attribute>N</attribute></attribute>
      <attribute><code>MSSQL_DOUBLE_DECIMAL_SEPARATOR</code><attribute>N</attribute></attribute>
      <attribute><code>PORT_NUMBER</code><attribute>1433</attribute></attribute>
      <attribute><code>PRESERVE_RESERVED_WORD_CASE</code><attribute>N</attribute></attribute>
      <attribute><code>QUOTE_ALL_FIELDS</code><attribute>N</attribute></attribute>
      <attribute><code>SUPPORTS_BOOLEAN_DATA_TYPE</code><attribute>Y</attribute></attribute>
      <attribute><code>SUPPORTS_TIMESTAMP_DATA_TYPE</code><attribute>Y</attribute></attribute>
      <attribute><code>USE_POOLING</code><attribute>N</attribute></attribute>
    </attributes>
  </connection>

Content of kettle.properties
#TestConnection
DBHostname=localhost
DBName=XYZ0001
DBUsername=sa
DBPassword=******

Now Close your spoon UI if opened (this action is required to reflect your changes in spoon/UI of ETL tool)
Move to directory **/.kettle
delete any backup files if exists
start spoon.bat file

In above picture you can see the newly added variable in shared.xml file and the variable values available in kettle.properties file.

Friday, April 1, 2016

ORA-00119: invalid specification for system parameter LOCAL_LISTENER

Welcome back after a weekend off, every body is waiting for weekend, we too. Some times when return to office some of the stuffs not working properly. Do not know why, any alien?

Some thing happen, we returned and tried to startup services, it was not working, throws below error. Without database we can not start our work, so need to fix this on higher priority.

I followed below step to fix this and resolved and working fine, you guys can also follow below step if you have faced the same problem. Do not afraid just fire the commands after verifying the paths.

ORA-00119: invalid specification for system parameter LOCAL_LISTENER
ORA-00132: syntax error or unresolved network name 'LISTENER_ORCL'

Problem
E:\app\*****\admin\orcl\pfile
Open pfile in editor

local_listener=LISTENER_ORCL

Solution
Update the below entry (orcl is a local database sid)
local_listener=orcl

startup pfile='E:\app\*****\admin\orcl\pfile\init.ora'

create spfile from pfile='E:\app\*****\admin\orcl\pfile\init.ora' ;

shu immediate

startup

Thursday, March 17, 2016

ORA-39095: Dump file space has been exhausted: Unable to allocate 8192 bytes

This error can be seen during import export of database in Oracle server. Error can be throw because of wrong command fired on prompt. I my case i have been using fixed dump file name with parallel processing.
In order to take database backup we need to use different syntax so that we will not face this error in future. I have writing my observation below, how i resolved this error. Please provide your valuable comments if it works.

I offered a command to take database backup in oracle and ended up with error after a few minutes

expdp scott/tiger dumpfile=scott.dmp directory=DATA_PUMP_DIR parallel=4 status=1

Worker 4 Status:
Process Name: DW03
State: EXECUTING
Object Schema: scott
Object Name: emp
Object Type: SCHEMA_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 1014
Worker Parallelism: 3
ORA-39095: Dump file space has been exhausted: Unable to allocate 8192 bytes

SOLUTION
Use %U in dumpfile creation, expdp will automatically take care of number of file need to created as per size of database size

expdp scott/tiger dumpfile=scott_%U.dmp directory=DATA_PUMP_DIR parallel=4

Tuesday, March 15, 2016

want to divide table into 10 equal parts

If you want to divide table into 10 equal parts then you might need to min max of number of primary field (field which holds the primary key)
Instead of manual work, you can easily identify the min max for division purpose, you can use below query

select
min(object_id) as v_start
,max(object_id) as v_end
,num
from
(
select
object_id,
--divide all rows into group number
ntile(10) over (order by object_id) num
from all_objects
)
group by num;

You can use above logic any where for further logic's.

Sunday, March 13, 2016

How to auto delete macro code after execution

We can also auto delete macro code written in excel workbook. Some times workbooks are generated thru the java/.net and other automation process, it can have macro codes or can have compatibility issues for further usage of spreadsheet/workbook. To remove those issues we have to delete macro codes. See below step we need to perform to achieve the same.

Lets edit macro code from previous post.

Sub Ato_Open()

    Dim wk As Worksheet
    ' Declare object variable
    Dim vbCom As Object
    Set vbCom = Application.VBE.ActiveVBProject.VBComponents

    MsgBox "It will Fill Values in cell of all sheets"
    For Each wk In ThisWorkbook.Worksheets
    wk.Activate
    With wk.Range("I5")
    .FormulaR1C1 = "test 1 2 3"
    End With
    Next wk

vbCom.Remove VBComponent:= _
vbCom.Item("Module1")

End Sub

* note - highlighed/Bold code is newly added in previous post.

Now save the macro code close and reopen the workbook.

ERROR:
Run-time error '1004':

Programmatic access to Visual Basic Projects is not trusted

SOLUTION
Please look for below screen shot and follow the same process and enable the trust access to the VBA code

Close and reopen the workbook.. It works!!!!!!

Auto Open [Run] macro code - when workbook open

Lets have look on older post, which is showing that how to execute macro code for individual sheet of a workbook.

http://j4info.blogspot.in/2013/05/execute-macro-for-each-sheet-in-workbook.html

Earlier we were able to manage to execute macro code for each sheet available in spreadsheet or Workbook. We had achieved one task and now mind think why i need to execute macro code manually, it should be executed automatically as soon as i open the spreadsheet. I started working on this, it is well said nothing is impossible. we did this, see below.

In this post we will do workaround that how to auto open macro code and start its working after immediate open of workbook

Sub Auto_Open()
    Dim wk As Worksheet
    MsgBox "Fill Values in cell of all sheets"
    For Each wk In ThisWorkbook.Worksheets
    wk.Activate
    With wk.Range("I5")
    .FormulaR1C1 = "testqqq"
    End With
    Next wk
End Sub

USe Auto_Open() thats it....

Friday, February 26, 2016

Add/upload documents in image or varbinary datatype

This post related to SQL Server, question arise in my mind where all the documents goes in database, if we can save all documents images, videos in database, answer is yes, we can do. I have started looking into this, how we can do this. So my knowledge which contains couple of hour and success is below. You try in SQLServer and create new table, then place any document at right place and change the path and execute the below query.

create table documents(id int not null, doctype varchar(4), content varbinary(max));

INSERT INTO documents(id, doctype, content)
SELECT
    2 as id,
    '.pdf' AS doctype,
    * FROM OPENROWSET(BULK N'D:\common\ebooks2\India-Year-Book-2009.pdf', SINGLE_BLOB) AS Document;
GO

select * from documents;
go

Add comments if you like the post.

How to pull out blob content out to a directory using kettle

This post related to , if you had saved the blob data in your table then how to get your data out from a database table, its quiet easy. Some of the database also have their own algorithm of doing this task, but i am doing this with ETL kettle tool which is pretty fast(its not pretty its incredibly fast), proceed with follow steps:

1. create table, add some blob data
2. create transformation and execute it

1. Create table and add blob data
follow below step/post

CREATE TABLE t (id NUMBER(10),doc BLOB, name varchar2(30) );

CREATE SEQUENCE t_seq;

CREATE OR REPLACE DIRECTORY documents AS 'd:\app';

CREATE OR REPLACE PROCEDURE upload_t (file_name_v IN varchar) AS
v_bfile BFILE;
v_blob BLOB;
BEGIN
INSERT INTO t (id, doc,name) vALUES (t_seq.NEXTVAL, empty_blob(),file_name_v) RETURN doc INTO v_blob;
v_bfile := BFILENAME('DOCUMENTS', file_name_v);
Dbms_Lob.Fileopen(v_bfile, Dbms_Lob.File_Readonly);
Dbms_Lob.Loadfromfile(v_blob, v_bfile, Dbms_Lob.Getlength(v_bfile));
Dbms_Lob.Fileclose(v_bfile);

COMMIT;
end;
/

EXEC upload_t('New Text Document.txt');

2. Build transformation as per below image:

Add below java script to step "Modified Java Script Value"

var File = java.io.File;
var FileOutputStream = java.io.FileOutputStream;
var BufferedOutputStream = java.io.BufferedOutputStream;
var out_msg;
var err_msg;
var filename = substr(getVariable('Internal.Transformation.Filename.Directory', '') +'/'+file_name, 7);
var outBytes = file;
try{
    var f = new File(filename);
      if (!f.exists()){
        f.createNewFile();
    }
 
    var fos = new FileOutputStream(f);
    var bos = new BufferedOutputStream(fos);
    bos.write(outBytes);
    bos.flush();
    bos.close();
 
    out_msg = 'succeed';
//    err_msg = 'OK';
 
}
catch (e){
    out_msg = 'Failed';
    err_msg = e.javaException.toString();
}

Save and execute transformation

It will fetch the blob data and build the file at path same as of transformation path. Moreover you can modify javascript to add custom path.

NOTE*- file_name must contain the value as file name with postfix extension (format - filename.extension).

BENIFIT - benefit of using this code is it will work for ORACLE and SQLServer Database (these two are tested by me)
Oracle - Blob datatype
SQLServer - Image and Varbinary(max) datatype

Wednesday, February 24, 2016

Call job through kitchen, stop displaying error on batch [kettle]

Scenario -

I am calling a job through batch, and job contains verification of database connection active inactive. In case if connection parameters are wrong then step 'Check Db connections' will throw error on batch file.

(Batch file code as below):

call kitchen /file:"%CD%\Data_Migration\zzzzzzzzz.kjb" /norep:Y /level:basic > ".\TEST%systemtime%.log"

Open CMD> execution *.bat file

E:\z_temp\kettle_5_4>test.bat
2016/02/24 13:36:56 - Check Db connections 2 - ERROR (version 5.4.0.1-130, build 1 from 2015-06-14_12-34-55 by buildguy) :
2016/02/24 13:36:56 - Check Db connections 2 - Error occurred while trying to connect to the database
2016/02/24 13:36:56 - Check Db connections 2 -
2016/02/24 13:36:56 - Check Db connections 2 - Error connecting to database: (using class net.sourceforge.jtds.jdbc.Driver)
2016/02/24 13:36:56 - Check Db connections 2 - Login failed for user 'sa'.
2016/02/24 13:36:56 - Check Db connections 2 -
2016/02/24 13:36:56 - Check Db connections 2 - ]

Solution

If you don't want to show errors on log file then made changes as below, while call batch file

call kitchen /file:"%CD%\Data_Migration\zzzzzzzzz.kjb" /norep:Y /level:basic > ".\TEST%systemtime%.log" 2>&1

E:\z_temp\kettle_5_4>test.bat
E:\z_temp\kettle_5_4>

Sunday, February 14, 2016

Function to process each character of String [SQL server]

Some time nested function will not work, what is the solution? !!!!!!

Want to process each character of string???

--drop function dbo.replacechars

create FUNCTION replacechars (@InStr varchar(max))
RETURNS varchar(max)
AS
BEGIN
declare @alter_string varchar(max) , @length int , @cnt int , @StoreChar varchar(100)
SET @alter_string = ''
SET @length = 0
SET @cnt = 1
SET @length = LEN(@InStr)
IF @length > 0
BEGIN WHILE @cnt <= @length
BEGIN
SET @StoreChar = SUBSTRING(@InStr, @cnt, 1)
set @StoreChar =
case when isnull(@StoreChar,' ') = ';' then '&#.x3b;'
when isnull(@StoreChar,' ') = '#' then '&#.x23;'
when isnull(@StoreChar,' ') = '&' then '&#.x26;'
when isnull(@StoreChar,' ') = '!' then '&#.x21;'
when isnull(@StoreChar,' ') = '@' then '&#.x40;'
when isnull(@StoreChar,' ') = '$' then '&#.x24;'
when isnull(@StoreChar,' ') = '%' then '&#.x25;'
when isnull(@StoreChar,' ') = '^' then '&#.x5e;'
when isnull(@StoreChar,' ') = '*' then '&#.x2a;'
when isnull(@StoreChar,' ') = '(' then '&#.x28;'
when isnull(@StoreChar,' ') = ')' then '&#.x29;'
when isnull(@StoreChar,' ') = '=' then '&#.x3d;'
when isnull(@StoreChar,' ') = '+' then '&#.x2b;'
when isnull(@StoreChar,' ') = '{' then '&#.x7b;'
when isnull(@StoreChar,' ') = '}' then '&#.x7d;'
when isnull(@StoreChar,' ') = '[' then '&#.x5b;'
when isnull(@StoreChar,' ') = ']' then '&#.x5d;'
when isnull(@StoreChar,' ') = '\' then '&#.x5c;'
when isnull(@StoreChar,' ') = '|' then '&#.x7c;'
when isnull(@StoreChar,' ') = '''' then '&#.x27;'
when isnull(@StoreChar,' ') = ':' then '&#.x3a;'
when isnull(@StoreChar,' ') = '"' then '&#.x22;'
when isnull(@StoreChar,' ') = '<' then '&#.x3c;'
when isnull(@StoreChar,' ') = '>' then '&#.x3e;'
when isnull(@StoreChar,' ') = '?' then '&#.x3f;'
when isnull(@StoreChar,' ') = '/' then '&#.x2f;'
when isnull(@StoreChar,' ') = CHAR(160) then '&#.xa0;'
when isnull(@StoreChar,' ') = CHAR(10) then '&#.xd;&#.xa;'
else isnull(@StoreChar,' ') end
BEGIN SET @alter_string = @alter_string + isnull(@StoreChar,' ') END
SET @cnt = @cnt + 1
END END
RETURN @alter_string END

Note - PLEASE remove . (dot) from string written after then clause

select dbo.replacechars('a ; f5637# $ % ^ ')

Replace Special Char from string for "table input step" kettle transformation

Some requirement i faced, to write dynamic SQL script for kettle transformation. SQL scripts are generally written in "table input step" and "SQL script" steps.

Observation i have made with two different version of data integration tool (kettle by pentaho) with 4.4 and 6.0.

Version 4.4
When we wrote some SQL script in transformation (table input step) , and if we open the saved transformation in notepad, i observed that the SQL script was in same format enclosed with html tags.

Version 5.4 and above (my observation, version are not accurate)
When i wrote some SQL script in transformation and i open the transformation in notepad, observed that the SQL script written in Spoon (developers mode), now has been in changed format. i seen that the special chars used in SQL script has been changed to some numbers.

I have listed out some special char conversions, as in below image:

Now the main task is how i can replace all above char in incoming String from any source.

Nested replace function will not work here, because replacing string contains itself special chars.

So i created own function, which will process each char of incoming string.

To see the function please refer to below link

Function to process each character of String [SQL server]

Friday, February 12, 2016

Split string on the first occurrence of underscore using kettle

Having a string contain underscore (string can have multiple underscores), want to split on first occurrence of underscore into two parts in kettle. To achieve this I used ‘Java Script’ step, which is quite easy and can perform using single step.

Below is the code for ‘Modified Java Script Value’ step:

var ToBeSplit = InString;

var IndexLevel = ToBeSplit.indexOf("_");

var LeftTrim = ToBeSplit.substr('_',IndexLevel);

var RightTrim = ToBeSplit.substr(IndexLevel+1);

Saturday, January 9, 2016

SQLServer - Integer Format

Query1
select id from table_name

Query2
select cast(format(row_number() over (order by id) ,'0000#') as varchar) from table_name

Results of above two query in below picture.

I ran same query in SQLServer Version 2008, as "format" function is not available in 2008, so below is the altrenative lof same query in SQLServer 2008 version.

Error
Msg 195, Level 15, State 10, Line 1
'format' is not a recognized built-in function name.

select RIGHT ('0000'+ CAST(row_number() over (order by id) AS varchar), 5) from tabe_name

Friday, January 8, 2016

SQLSERVER - licensed limit of 10240 MB per Databae

After failed for Database (Microsoft.SqlServer.Smo)

Error:
An Exception occured while executing a Transact-SQL statement or batch.
(Microsoft.SqlServer.ConnectionInfo)

CREATE DATABASE or ALTER DATABASE failed because the resulting cumulative database
size would exceed you licensed limit of 10240 MB per Databae.
(Microsoft SQL Server, Error:1827)

Limitation:
If you are using MS SQL Server express edition you will not able to create database having size greater than 10 GB.

Thursday, January 7, 2016

Kettle log contains string - ERROR (version 5.4.0.1-130, build 1 from ** by buildguy)

ERROR (version 5.4.0.1-130, build 1 from ** by buildguy)

In kettle log i have found a string continuously printing until job completed.

How to remove this string from logs?

Open Job for which you are generating logs, (In my case i have another "Job step" in main job and generating separate logs for that job )
Double click on "Job Entry Step"
Popup can be seen (Heading - Executing a Job)
Move to tab "Logging Settings"
Change Loglevel from "Error Logging Only" to "Basic "

Friday, January 1, 2016

Table Input/Evaluate rows number in a table/Execute SQL step, On error abort all job sequence

This error is somewhere related to my previous post

kettle - Filter step not working initialize both input steps link

picture 1
i have some set of queries as you can see in above picture highlighted in green box , bearing type of 's' and 't'. That mean i have to filter records with field "type"

If type - 'S' then - execute query with connection 1
If type - 'T' then - execute query with connection 2

In previous post i suggest to use step "Simple evaluation" at job level to filter rows, it works fine
but i faced some another condition on the way :

REQUIREMENT
I have to execute all the four queries with their respective type of connection, and log the result, i.e. query execution success and failure

OOPS what happen !!!!!!!!
look in the picture the query highlighted with yellow color having wrong table name, now it is obvious query got fails while execution and it leads to abort the whole job process.
OH IT NOT MET TO MY SCENARIO (scenario is log the result, for query execution success and failure)

SOLUTION
to met above condition i change the orientation of my data-set, as highlighted in RED BOX.
Set variable for each different column

Test Query at job level ..see below image(job in loop)

Pages

Labels