Thursday, May 31, 2012

Quick Hadoop 2-node Cluster Setup for Oracle DBAs


OS Version: Red Hat Enterprise Linux Server release 5.6 
Hadoop Version: 1.0.1
Hosts: host1, host2


Step 1: Install Java on the Hosts

yum install java-1.6.0-openjdk-devel.x86_64

Check for Java version
java -version

Step 2: Create OS Group and User on hosts

Add Group:
/usr/sbin/groupadd hdfs

Add User:
/usr/sbin/useradd -G hdfs -d /home/hadoop hadoop

Change user password:
passwd hadoop

Step 3: Setup password less ssh between the hadoop users on both the hosts

login to hadoop user on both the hosts:
hadoop@host1:
ssh-keygen -t rsa
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Copy file $HOME/.ssh/authorized_keys to host2 /tmp/
hadoop@host2:
ssh-keygen -t rsa
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
cat /tmp/authorized_keys >> $HOME/.ssh/authorized_keys
Now copy the $HOME/.ssh/authorized_keys to host1 /tmp
chmod 700 $HOME/.ssh
cd $HOME/.ssh
chmod 600 *
hadoop@host1:
cp /tmp/authorized_keys $HOME/.ssh/authorized_keys
chmod 700 $HOME/.ssh
chmod 700 $HOME/.ssh
cd $HOME/.ssh
chmod 600 *
ssh host1
ssh host2
hadoop@host2:
ssh host1
ssh host2

This is setup password ssh between hadoop user on both the hosts

Step 4: Download and install hadoop

Download location: http://www.apache.org/dyn/closer.cgi/hadoop/common/
Download the release of your choice.
Download hadoop-1.0.1.tar.gz, ungzip and untar

mv hadoop-1.0.1 hadoop

Hadoop location: /home/hadoop/hadoop

Step 5: Configure Hadoop


5.a: Update JAVA_HOME in /home/hadoop/conf/hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64

5.b: Add following entry to /home/hadoop/conf/core-site.xml



<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hadoop/tmp</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>



5.c: Add following to /home/hadoop/conf/mapred-site.xml


<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>



5.d: Add following to /home/hadoop/conf/hdfs-site.xml



<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>


5.e: Update /home/hadoop/conf/masters with the name of the master node

[hadoop@host1 conf]$ more masters
host1

5.f: Update /home/hadoop/conf/slaves with the name of the slave nodes

[hadoop@host1 conf]$ more slaves
host1
host2

5.g: Format name node


[hadoop@host1 ~]$ hadoop namenode -format
12/05/09 09:42:03 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = host1/66.228.160.91
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.0.1
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1243785; compiled by 'hortonfo' on Tue Feb 14 08:15:38 UTC 2012
************************************************************/
12/05/09 09:42:03 INFO util.GSet: VM type       = 64-bit
12/05/09 09:42:03 INFO util.GSet: 2% max memory = 17.77875 MB
12/05/09 09:42:03 INFO util.GSet: capacity      = 2^21 = 2097152 entries
12/05/09 09:42:03 INFO util.GSet: recommended=2097152, actual=2097152
12/05/09 09:42:03 INFO namenode.FSNamesystem: fsOwner=hadoop
12/05/09 09:42:03 INFO namenode.FSNamesystem: supergroup=supergroup
12/05/09 09:42:03 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/05/09 09:42:03 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
12/05/09 09:42:03 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
12/05/09 09:42:03 INFO namenode.NameNode: Caching file names occuring more than 10 times 
12/05/09 09:42:03 INFO common.Storage: Image file of size 112 saved in 0 seconds.
12/05/09 09:42:03 INFO common.Storage: Storage directory /home/hadoop/tmp/dfs/name has been successfully formatted.
12/05/09 09:42:03 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at host1/66.228.160.91
************************************************************/


Step 6: Startup hadoop


[hadoop@host1 bin]$ ./start-all.sh 
starting namenode, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-namenode-host1.out
host2: starting datanode, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-datanode-host2.out
host1: starting datanode, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-datanode-host1.out
host1: starting secondarynamenode, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-secondarynamenode-host1.out
host1: Exception in thread "main" java.io.IOException: Cannot lock storage /home/hadoop/temp/dfs/namesecondary. The directory is already locked.
host1: at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:602)
host1: at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:455)
host1: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.recoverCreate(SecondaryNameNode.java:615)
host1: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:175)
host1: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:129)
host1: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:567)
starting jobtracker, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-jobtracker-host1.out
host2: starting tasktracker, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-tasktracker-host2.out
host1: starting tasktracker, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-tasktracker-host1.out


Checks:
[hadoop@host1 bin]$ jps
20358 JobTracker
20041 DataNode
20528 TaskTracker
19869 NameNode
27905 Jps
26450 oc4j.jar
20229 SecondaryNameNode

[hadoop@host2 ~]$ jps
12522 Jps
1737 TaskTracker
1572 DataNode

jps - Java Virtual Machine Process Status Tool, show the java VMs running.

Step 7: Running a basic Map Reduce Test


I am using the sample test from: http://wiki.apache.org/hadoop/WordCount

7.a: Create directories in Hadoop:

[hadoop@host1 ~]$ hadoop dfs -mkdir input
[hadoop@host1 ~]$ hadoop dfs -mkdir output

Hadoop shell reference: http://hadoop.apache.org/common/docs/r0.17.1/hdfs_shell.html

7.b: Copy the script from the web to /home/hadoop/WordCount.java and compile

create directory wordcount
mkdir -p /home/hadoop/wordcount

[hadoop@host1 ~]$ javac -classpath /usr/share/hadoop/hadoop-core-1.0.1.jar -d wordcount WordCount.java
[hadoop@host1 ~]$ jar -cvf /home/hadoop/wordcount.jar -C wordcount/ .
[hadoop@host1 ~]$ jar -cvf /home/hadoop/wordcount.jar -C wordcount/ .
added manifest
adding: org/(in = 0) (out= 0)(stored 0%)
adding: org/myorg/(in = 0) (out= 0)(stored 0%)
adding: org/myorg/WordCount$Map.class(in = 1938) (out= 798)(deflated 58%)
adding: org/myorg/WordCount.class(in = 1546) (out= 749)(deflated 51%)
adding: org/myorg/WordCount$Reduce.class(in = 1611) (out= 649)(deflated 59%)

This will create the jar file to be used in the test.

7.c: Create input files:

file1:
Hello World Bye World
file2:
Hello Hadoop Goodbye Hadoop

7.d: Copy the files over to Hadoop:

[hadoop@host1 ~]$ hadoop dfs -copyFromLocal file1 input
[hadoop@host1 ~]$ hadoop dfs -copyFromLocal file2 input
[hadoop@host1 ~]$ hadoop dfs -ls input
Found 2 items
-rw-r--r--   1 hadoop supergroup         23 2012-05-09 09:44 /user/hadoop/input/file1
-rw-r--r--   1 hadoop supergroup         28 2012-05-09 09:44 /user/hadoop/input/file2

7.e: Run the Map Reduce Java program:


[hadoop@host1 ~]$ hadoop jar /home/hadoop/wordcount.jar org.myorg.WordCount /user/hadoop/input /user/hadoop/output
12/05/30 16:03:44 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/05/30 16:03:44 INFO mapred.FileInputFormat: Total input paths to process : 2
12/05/30 16:03:44 INFO mapred.JobClient: Running job: job_201205091119_0004
12/05/30 16:03:45 INFO mapred.JobClient:  map 0% reduce 0%
12/05/30 16:03:58 INFO mapred.JobClient:  map 100% reduce 0%
12/05/30 16:04:10 INFO mapred.JobClient:  map 100% reduce 100%
12/05/30 16:04:15 INFO mapred.JobClient: Job complete: job_201205091119_0004
12/05/30 16:04:15 INFO mapred.JobClient: Counters: 30
12/05/30 16:04:15 INFO mapred.JobClient:   Job Counters 
12/05/30 16:04:16 INFO mapred.JobClient:     Launched reduce tasks=1
12/05/30 16:04:16 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=19996
12/05/30 16:04:16 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/05/30 16:04:16 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/05/30 16:04:16 INFO mapred.JobClient:     Launched map tasks=3
12/05/30 16:04:16 INFO mapred.JobClient:     Data-local map tasks=3
12/05/30 16:04:16 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10031
12/05/30 16:04:16 INFO mapred.JobClient:   File Input Format Counters 
12/05/30 16:04:16 INFO mapred.JobClient:     Bytes Read=55
12/05/30 16:04:16 INFO mapred.JobClient:   File Output Format Counters 
12/05/30 16:04:16 INFO mapred.JobClient:     Bytes Written=41
12/05/30 16:04:16 INFO mapred.JobClient:   FileSystemCounters
12/05/30 16:04:16 INFO mapred.JobClient:     FILE_BYTES_READ=79
12/05/30 16:04:16 INFO mapred.JobClient:     HDFS_BYTES_READ=412
12/05/30 16:04:16 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=86679
12/05/30 16:04:16 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=41
12/05/30 16:04:16 INFO mapred.JobClient:   Map-Reduce Framework
12/05/30 16:04:16 INFO mapred.JobClient:     Map output materialized bytes=91
12/05/30 16:04:16 INFO mapred.JobClient:     Map input records=2
12/05/30 16:04:16 INFO mapred.JobClient:     Reduce shuffle bytes=51
12/05/30 16:04:16 INFO mapred.JobClient:     Spilled Records=12
12/05/30 16:04:16 INFO mapred.JobClient:     Map output bytes=82
12/05/30 16:04:16 INFO mapred.JobClient:     Total committed heap usage (bytes)=443023360
12/05/30 16:04:16 INFO mapred.JobClient:     CPU time spent (ms)=1440
12/05/30 16:04:16 INFO mapred.JobClient:     Map input bytes=51
12/05/30 16:04:16 INFO mapred.JobClient:     SPLIT_RAW_BYTES=357
12/05/30 16:04:16 INFO mapred.JobClient:     Combine input records=8
12/05/30 16:04:16 INFO mapred.JobClient:     Reduce input records=6
12/05/30 16:04:16 INFO mapred.JobClient:     Reduce input groups=5
12/05/30 16:04:16 INFO mapred.JobClient:     Combine output records=6
12/05/30 16:04:16 INFO mapred.JobClient:     Physical memory (bytes) snapshot=576110592
12/05/30 16:04:16 INFO mapred.JobClient:     Reduce output records=5
12/05/30 16:04:16 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2767060992
12/05/30 16:04:16 INFO mapred.JobClient:     Map output records=8


7.f: Check for the output in hadoop:

[hadoop@host1 ~]$ hadoop dfs -ls output
Found 3 items
-rw-r--r--   2 hadoop supergroup          0 2012-05-09 11:22 /user/hadoop/output/_SUCCESS
drwxr-xr-x   - hadoop supergroup          0 2012-05-09 11:22 /user/hadoop/output/_logs
-rw-r--r--   2 hadoop supergroup         41 2012-05-09 11:22 /user/hadoop/output/part-00000

7.g: Check the output:

[hadoop@host1 ~]$ hadoop dfs -cat /user/hadoop/output/part-00000
Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2

This completes the quick setup of a 2 node hadoop cluster.

While trying to setup and learn I referred to the following links. My thanks and regards for all the help:

For single node setup: http://hadoop.apache.org/common/docs/current/single_node_setup.html#Local
For cluster setup: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
http://hortonworks.com/blog/set-up-apache-hadoop-in-minutes-with-rpms/
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

Saturday, May 26, 2012

Setup Data replication from Oracle to MongoDB


Credits: http://tebros.com/2011/09/keep-mongodb-and-oracle-in-sync-using-streams-advanced-queuing/

Components used:
MongoDB: MongoDB shell version: 2.0.5
Oracle: 11.2.0.2
Oracle Advanced Queuing
Perl scripting
DBD:Oracle: perl DBD for Oracle
JSON perl module.

The model for this setup is:
Replicate HR.EMPLOYEES table from Oracle to MongoDB.
Use Oracle AQ to queue insert, update & delete and then use perl script to dequeue them in MongoDB.

Step 1:
Use SCOTT.EMP table available in the SCOTT schema in oracle.
The structure is:

SQL> desc emp


 Name Data Type     
 -------------------------
 EMPNO NUMBER(4)    
 ENAME VARCHAR2(10) 
 JOB VARCHAR2(9)  
 MGR NUMBER(4)    
 HIREDATE DATE         
 SAL NUMBER(7,2)  
 COMM NUMBER(7,2)  
 DEPTNO NUMBER(2)    


Step 2: Grant execute on DBMS_AQ, DBMS_AQADM to SCOTT


SQL> grant execute on DBMS_AQ to SCOTT;


Grant succeeded.


SQL> grant execute on DBMS_AQADM to SCOTT;


Grant succeeded.




Step 3: Setup Queue Table in Oracle


CREATE TYPE EMP_T AS OBJECT (MSG VARCHAR2(4000));


BEGIN
-- THIS PROCEDURE CREATES A QUEUE TABLE FOR MESSAGES OF A PREDEFINED TYPE.
DBMS_AQADM.CREATE_QUEUE_TABLE(
  QUEUE_TABLE => 'EMPQUEUE_QT',
  QUEUE_PAYLOAD_TYPE => 'EMP_T'
);


-- This procedure creates a queue in the specified queue table.
DBMS_AQADM.CREATE_QUEUE(
  QUEUE_NAME => 'EMPQUEUE_Q',
  QUEUE_TABLE => 'EMPQUEUE_QT'
);


-- This procedure enables the specified queue for enqueuing or dequeuing.
DBMS_AQADM.START_QUEUE(
  QUEUE_NAME => 'EMPQUEUE_Q'
);
END;
/


QUEUE TABLE Structure:
SQL> desc AQ$EMPQUEUE_QT
 Name   Null?    Type
 ----------------------------------------- -------- ----------------------------
 QUEUE    VARCHAR2(30)
 MSG_ID   NOT NULL RAW(16)
 CORR_ID    VARCHAR2(128)
 MSG_PRIORITY    NUMBER
 MSG_STATE    VARCHAR2(16)
 DELAY    DATE
 DELAY_TIMESTAMP    TIMESTAMP(6)
 EXPIRATION    NUMBER
 ENQ_TIME    DATE
 ENQ_TIMESTAMP    TIMESTAMP(6)
 ENQ_USER_ID    VARCHAR2(30)
 ENQ_TXN_ID    VARCHAR2(30)
 DEQ_TIME    DATE
 DEQ_TIMESTAMP    TIMESTAMP(6)
 DEQ_USER_ID    VARCHAR2(30)
 DEQ_TXN_ID    VARCHAR2(30)
 RETRY_COUNT    NUMBER
 EXCEPTION_QUEUE_OWNER    VARCHAR2(30)
 EXCEPTION_QUEUE    VARCHAR2(30)
 USER_DATA    EMP_T
 ORIGINAL_QUEUE_NAME    VARCHAR2(30)
 ORIGINAL_QUEUE_OWNER    VARCHAR2(30)
 EXPIRATION_REASON    VARCHAR2(18)
 SENDER_NAME    VARCHAR2(30)
 SENDER_ADDRESS    VARCHAR2(1024)
 SENDER_PROTOCOL    NUMBER
 ORIGINAL_MSGID    RAW(16)


SQL> SELECT name, queue_table,enqueue_enabled,dequeue_enabled,user_comment FROM user_queues; 


NAME       QUEUE_TABLE ENQUEUE DEQUEUE USER_COMMENT   
------------------------------ ------------------------------ -----
EMPQUEUE_Q       EMPQUEUE_QT YES YES                    
AQ$_EMPQUEUE_QT_E   EMPQUEUE_QT NO NO     exception queue




Step 4: Create procedure to enqueue messages/DMLs


CREATE OR REPLACE PROCEDURE enqueue_emp(payload VARCHAR2) AS
  msg1 emp_t := emp_t(NULL);
  msg_id RAW(16);
  priority NUMBER;
  enqueue_options DBMS_AQ.ENQUEUE_OPTIONS_T;
  message_properties DBMS_AQ.MESSAGE_PROPERTIES_T;
BEGIN
  msg1.msg := payload;
  message_properties.priority := 1;
  DBMS_AQ.ENQUEUE(
    queue_name => 'EMPQUEUE_Q',
    enqueue_options => enqueue_options,
    message_properties => message_properties,
    payload => msg1,
    msgid => msg_id);
END;
/




Step 5: Create procedure to dequeue messages/DMPs to be used to load data in mongoDB.




CREATE PROCEDURE dequeue_emp(payload OUT VARCHAR2) AS
  msg1 emp_t := emp_t(NULL);
  msg_id RAW(16);
  dequeue_options DBMS_AQ.DEQUEUE_OPTIONS_T;
  message_properties DBMS_AQ.MESSAGE_PROPERTIES_T;
BEGIN
  DBMS_AQ.DEQUEUE(
    queue_name => 'EMPQUEUE_Q',
    dequeue_options => dequeue_options,
    message_properties => message_properties,
    payload => msg1,
    msgid => msg_id
  );
  payload := msg1.msg;
END;
/




Step 6: Create Trigger to queue DMLs on table SCOTT.EMP


CREATE OR REPLACE TRIGGER emp_q_trig
  AFTER INSERT OR UPDATE OR DELETE ON emp
  FOR EACH ROW
DECLARE
  msg VARCHAR2(4000);
BEGIN
  IF INSERTING OR UPDATING THEN
    msg := '{"empno":' || :new.empno || ',"ename":"' || :new.ename || '"';
    msg := msg||',"job":"' || :new.job || '","mgr":"' || :new.mgr || '"';
    msg := msg||',"hiredate":"' || :new.HIREDATE || '","sal":"' || :new.sal || '"';
    msg := msg||',"comm":"' || :new.COMM || '","deptno":"' || :new.DEPTNO || '"';
    msg := msg||',"dml_type":"';
    msg := msg|| CASE WHEN INSERTING THEN 'I' ELSE 'U' END || '"}';
  ELSE
    msg := '{"empno":' || :old.empno || ',"dml_type":"D"}';
  END IF; 
  enqueue_emp(msg);
END;
/


Step 7: Lets test the AQ Setup

SQL> select * from emp;


     EMPNO ENAME      JOB       MGR HIREDATE    SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7369 SMITH      CLERK      7902 17-DEC-80    800   20
      7499 ALLEN      SALESMAN      7698 20-FEB-81   1600        300   30
      7521 WARD       SALESMAN      7698 22-FEB-81   1250        500   30
      7566 JONES      MANAGER      7839 02-APR-81   2975   20
      7654 MARTIN     SALESMAN      7698 28-SEP-81   1250       1400   30
      7698 BLAKE      MANAGER      7839 01-MAY-81   2850   30
      7782 CLARK      MANAGER      7839 09-JUN-81   2450   10
      7788 SCOTT      ANALYST      7566 19-APR-87   3000   20
      7839 KING       PRESIDENT   17-NOV-81   5000   10
      7844 TURNER     SALESMAN      7698 08-SEP-81   1500 0   30
      7876 ADAMS      CLERK      7788 23-MAY-87   1100   20
      7900 JAMES      CLERK      7698 03-DEC-81    950   30
      7902 FORD       ANALYST      7566 03-DEC-81   3000   20
      7934 MILLER     CLERK      7782 23-JAN-82   1300   10


14 rows selected.


SQL> UPDATE EMP SET COMM=200 WHERE EMPNO=7369;


1 ROW UPDATED.


SQL> COMMIT;


COMMIT COMPLETE.


SQL> SELECT COUNT(*) FROM AQ$EMPQUEUE_QT;


  COUNT(*)
----------
1


SQL> DECLARE
  2  VAR1 VARCHAR2(4000);
  3  BEGIN
  4  DEQUEUE_EMP(VAR1);
  5  DBMS_OUTPUT.PUT_LINE('QUEUE OUTPUT:'|| VAR1);
  6  END;
  7  /
QUEUE OUTPUT:{"EMPNO":7369,"ENAME":"SMITH","JOB":"CLERK","MGR":"7902","HIREDATE":"17-DEC-80","SAL":"800","COMM":"200","DEPTNO":"20"","DML_TYPE":"U"}


PL/SQL PROCEDURE SUCCESSFULLY COMPLETED.


SQL> SELECT COUNT(*) FROM AQ$EMPQUEUE_QT;


  COUNT(*)
----------
0


We can see the QUEUE Output as well.

Step 8: Create a perl script to dequeue the message to mongoDB collection called EMP.


--------------Script Start------------------
#!/usr/bin/perl


use strict;
use DBI;
use DBD::Oracle;
use MongoDB;
use MongoDB::OID;
use Time::Local;
use JSON;


#Connect to the Oracle DB
my $dbh = DBI->connect("dbi:Oracle:host=hostname1;port=1521;sid=test","scott","tiger")
        or die "Error: " . $DBI::errstr;


#Connect to Mongo DB
my $conn=MongoDB::Connection->new(host => 'hostname1:27020');
print "conn $conn\n";


#Connect to required Mongo DB
my $db=$conn->test;
print "db $db\n";


#Connect to the required Mongo Collection
my $coll=$db->emp;
print "Collection: $coll\n";


# Count the messages in Oracle Queue
my $sql1 = "select count(*) from AQ\$EMPQUEUE_QT";
my $stm1 = $dbh->prepare($sql1);
$stm1->execute();
my $ccc = $stm1->fetchrow_array();
print "Message Count: $ccc\n";


#Fetch the Oracle AQ Queue Data
my $aqdata;
my $sql2 = "BEGIN dequeue_emp(:aqdata); END;";
my $stm2 = $dbh->prepare($sql2);
$stm2->bind_param_inout(":aqdata", \$aqdata,4000);


#Start of the while loop
my $counter="1";


while ($counter <= $ccc) {
$stm2->execute();
print "Counter: $counter\n";
##JSON output {"empno":7369,"ename":"SMITH","job":"CLERK","mgr":"7902","hiredate":"17-DEC-80","sal":"800","comm":"200","deptno":"20","dml_type":"U"}
my $json = $aqdata;
my $decoded_json = decode_json($json);
print $decoded_json->{"dml_type"};
print "\n";
print $decoded_json->{"ename"};
print "\n";


if ($decoded_json->{"dml_type"} eq "I"){
$coll->insert({empno => $decoded_json->{"empno"},ename => $decoded_json->{"ename"}, job => $decoded_json->{"job"}, mgr => $decoded_json->{"mgr"}, hiredate => $decoded_json->{"hiredate"}, sal => $decoded_json->{"sal"}, comm => $decoded_json->{"comm"}, deptno => $decoded_json->{"deptno"}});
print "Insert if \n";
}
if ($decoded_json->{"dml_type"} eq "U"){
$coll->remove({"empno" => $decoded_json->{"empno"}});
$coll->insert({empno => $decoded_json->{"empno"},ename => $decoded_json->{"ename"}, job => $decoded_json->{"job"}, mgr => $decoded_json->{"mgr"}, hiredate => $decoded_json->{"hiredate"}, sal => $decoded_json->{"sal"}, comm => $decoded_json->{"comm"}, deptno => $decoded_json->{"deptno"}});
print "Update if \n";
}
if ($decoded_json->{"dml_type"} eq "D"){
$coll->remove({"empno" => $decoded_json->{"empno"}});
print "Delete if \n";
}
$counter ++;
}


--------------Script END------------------


Step 9: Executing and testing the data replication


SQL> update emp set comm=200 where empno=7369;


1 row updated.


SQL> commit;


Commit complete.


SQL> select count(*) from aq$EMPQUEUE_QT;


  COUNT(*)
----------
1


###Executing the perl script


[oracle@hostname1 perlstuff]$ perl o2m_emp.pl 
conn MongoDB::Connection=HASH(0x1b45c630)
db MongoDB::Database=HASH(0x1b930580)
Collection: MongoDB::Collection=HASH(0x1c1c5210)
Message Count: 1
Counter: 1
U
SMITH
Update if


####Checking the data in mongoDB
[oracle@hostname1 ~]$ mongo --port 27020
MongoDB shell version: 2.0.5
connecting to: 127.0.0.1:27020/test
> use test
switched to db test
> show collections
emp
system.indexes
> db.emp.find()
{ "_id" : ObjectId("4fc016298b93a68269000000"), "comm" : "200", "sal" : "800", "job" : "CLERK", "mgr" : "7902", "hiredate" : "17-DEC-80", "deptno" : "20", "ename" : "SMITH", "empno" : NumberLong(7369) }
>




Step 10: Seed existing data to MongoDB from Oracle

We had 14 rows existing in SCOTT.EMP table. We might want to load those rows in MongoDB and then start the replication.
Spool the output of the below SQL

sqlplus scott/tiger


select '$collection->insert({"EMPNO"=>'||empno||',"ENAME"=>"'||ename||'","JOB"=>"'||job||'","MGR"=>"'||mgr||'","HIREDATE"=>"'||hiredate||'","SAL"=>"'||sal||'","COMM"=>"'||comm||'","DEPTNO"=>"'||deptno||'"});' from emp;


### Create a perl script to load these into MongoDB:
--------------Script Start------------------
#!/usr/bin/perl


use strict;
use MongoDB;
use MongoDB::OID;
use Time::Local;


my $conn=MongoDB::Connection->new(host => 'hostname1:27020');


print "conn $conn\n";


my $db=$conn->test;


print "db $db\n";


my $collection=$db->emp;


print "collection $collection\n";


my $cursor=$collection->find;


print "cursor $cursor\n\n";


$collection->insert({"EMPNO"=>7369,"ENAME"=>"SMITH","JOB"=>"CLERK","MGR"=>"7902","HIREDATE"=>"17-DEC-80","SAL"=>"800","COMM"=>"200","DEPTNO"=>"20"});
$collection->insert({"EMPNO"=>7499,"ENAME"=>"ALLEN","JOB"=>"SALESMAN","MGR"=>"7698","HIREDATE"=>"20-FEB-81","SAL"=>"1600","COMM"=>"300","DEPTNO"=>"30"});
$collection->insert({"EMPNO"=>7521,"ENAME"=>"WARD","JOB"=>"SALESMAN","MGR"=>"7698","HIREDATE"=>"22-FEB-81","SAL"=>"1250","COMM"=>"500","DEPTNO"=>"30"});
$collection->insert({"EMPNO"=>7566,"ENAME"=>"JONES","JOB"=>"MANAGER","MGR"=>"7839","HIREDATE"=>"02-APR-81","SAL"=>"2975","COMM"=>"","DEPTNO"=>"20"});
$collection->insert({"EMPNO"=>7654,"ENAME"=>"MARTIN","JOB"=>"SALESMAN","MGR"=>"7698","HIREDATE"=>"28-SEP-81","SAL"=>"1250","COMM"=>"1400","DEPTNO"=>"30"});
$collection->insert({"EMPNO"=>7698,"ENAME"=>"BLAKE","JOB"=>"MANAGER","MGR"=>"7839","HIREDATE"=>"01-MAY-81","SAL"=>"2850","COMM"=>"","DEPTNO"=>"30"});
$collection->insert({"EMPNO"=>7782,"ENAME"=>"CLARK","JOB"=>"MANAGER","MGR"=>"7839","HIREDATE"=>"09-JUN-81","SAL"=>"2450","COMM"=>"","DEPTNO"=>"10"});
$collection->insert({"EMPNO"=>7788,"ENAME"=>"SCOTT","JOB"=>"ANALYST","MGR"=>"7566","HIREDATE"=>"19-APR-87","SAL"=>"3000","COMM"=>"","DEPTNO"=>"20"});
$collection->insert({"EMPNO"=>7839,"ENAME"=>"KING","JOB"=>"PRESIDENT","MGR"=>"","HIREDATE"=>"17-NOV-81","SAL"=>"5000","COMM"=>"","DEPTNO"=>"10"});
$collection->insert({"EMPNO"=>7844,"ENAME"=>"TURNER","JOB"=>"SALESMAN","MGR"=>"7698","HIREDATE"=>"08-SEP-81","SAL"=>"1500","COMM"=>"0","DEPTNO"=>"30"});
$collection->insert({"EMPNO"=>7876,"ENAME"=>"ADAMS","JOB"=>"CLERK","MGR"=>"7788","HIREDATE"=>"23-MAY-87","SAL"=>"1100","COMM"=>"","DEPTNO"=>"20"});
$collection->insert({"EMPNO"=>7900,"ENAME"=>"JAMES","JOB"=>"CLERK","MGR"=>"7698","HIREDATE"=>"03-DEC-81","SAL"=>"950","COMM"=>"","DEPTNO"=>"30"});
$collection->insert({"EMPNO"=>7902,"ENAME"=>"FORD","JOB"=>"ANALYST","MGR"=>"7566","HIREDATE"=>"03-DEC-81","SAL"=>"3000","COMM"=>"","DEPTNO"=>"20"});
$collection->insert({"EMPNO"=>7934,"ENAME"=>"MILLER","JOB"=>"CLERK","MGR"=>"7782","HIREDATE"=>"23-JAN-82","SAL"=>"1300","COMM"=>"","DEPTNO"=>"10"});
--------------Script END------------------


DEMO for AQ: http://psoug.org/reference/aq_demo1.html

MongoDB Install and Quick Setup


1.) Download MongoDB from - http://www.mongodb.org/downloads
Copy is over to your host, unzip it.
Thats all to install Mongo DB.

2.) Create directory for storing your data

mkdir /home/oracle/data
You can create this directory anywhere on the host/filer/shared storage.

3.) Start mongoDB (http://www.mongodb.org/display/DOCS/Starting+and+Stopping+Mongo#StartingandStoppingMongo-RunningasaDaemon)

cd to bin directory inside the downloaded/uncompress mongoDB software.
cd /home/oracle/mongodb/bin

run ./mongod

By default it will start the mongoDB on port 27017.

Startup options:

If you want to change the port start it as:
./mongod --port 12345

To specify the data location:

./mongod --dbpath /home/oracle/data --port 12345

To specify the log directory path (store DB log like the alert log in oracle):

./mongod --dbpath /home/oracle/data --port 12345 -logpath /home/oracle/logs/mongo.log

To run it as a daemon

./mongod --dbpath /home/oracle/data --port 12345 --fork --logpath /home/oracle/logs/mongo.log

To get all the command like options (http://www.mongodb.org/display/DOCS/Command+Line+Parameters)

./mongod -h

4.) Login/Connect to the mongoDB database instance

./mongo --port

./mongo --port 12345

Note: Just by starting multiple mongoDB instance on multiple port creates multiple mongoDB instances.

You can now run mongoDB commands

./mongo --port 12345
PRIMARY> use admin
switched to db admin
PRIMARY> show dbs --> Shows the databases.
local 10.1982421875GB
test 0.203125GB
PRIMARY> db.help() --> Give list of Admin commands
DB methods:
db.addUser(username, password[, readOnly=false])
db.auth(username, password)
db.cloneDatabase(fromhost)
db.commandHelp(name) returns the help for the command
db.copyDatabase(fromdb, todb, fromhost)
db.createCollection(name, { size : ..., capped : ..., max : ... } )
db.currentOp() displays the current operation in the db
db.dropDatabase()
db.eval(func, args) run code server-side
db.getCollection(cname) same as db['cname'] or db.cname
db.getCollectionNames()
db.getLastError() - just returns the err msg string
db.getLastErrorObj() - return full status object
db.getMongo() get the server connection object
db.getMongo().setSlaveOk() allow this connection to read from the nonmaster member of a replica pair
db.getName()
db.getPrevError()
db.getProfilingLevel() - deprecated
db.getProfilingStatus() - returns if profiling is on and slow threshold 
db.getReplicationInfo()
db.getSiblingDB(name) get the db at the same server as this one
db.isMaster() check replica primary status
db.killOp(opid) kills the current operation in the db
db.listCommands() lists all the db commands
db.logout()
db.printCollectionStats()
db.printReplicationInfo()
db.printSlaveReplicationInfo()
db.printShardingStatus()
db.removeUser(username)
db.repairDatabase()
db.resetError()
db.runCommand(cmdObj) run a database command.  if cmdObj is a string, turns it into { cmdObj : 1 }
db.serverStatus()
db.setProfilingLevel(level,) 0=off 1=slow 2=all
db.shutdownServer()
db.stats()
db.version() current version of the server
db.getMongo().setSlaveOk() allow queries on a replication slave server
db.fsyncLock() flush data to disk and lock server for backups
db.fsyncUnock() unlocks server following a db.fsyncLock()


PRIMARY> use test  --> Change database
switched to db test
PRIMARY> show collections ---> Query the tables/collections in the database
customer
person
system.indexes
PRIMARY> db.customer.count() ---> select the row count (in oracle - select count(*) from customer;)
49996

5.) Shutdown MongoDB

[oracle@host1 ~]$ ps -ef|grep mongo
oracle   11899     1  0 12:25 ?        00:00:00 mongod --port 12345 --dbpath /home/oracle/test --rest --fork --logpath /home/oracle/test/mongo.log
oracle   11912 11755  0 12:25 pts/0    00:00:00 grep mongo
[oracle@host1 ~]$ mongo --port 12345
MongoDB shell version: 2.0.5
connecting to: 127.0.0.1:12345/test
> use admin
switched to db admin
> db.shutdownServer()
Fri May 25 12:26:04 DBClientCursor::init call() failed
Fri May 25 12:26:04 query failed : admin.$cmd { shutdown: 1.0 } to: 127.0.0.1:12345
server should be down...
Fri May 25 12:26:04 trying reconnect to 127.0.0.1:12345
Fri May 25 12:26:04 reconnect 127.0.0.1:12345 ok
Fri May 25 12:26:04 Socket say send() errno:104 Connection reset by peer 127.0.0.1:12345
Fri May 25 12:26:04 Error: error doing query: unknown shell/collection.js:151
> exit
bye
[oracle@host1 ~]$ ps -ef|grep mongo
oracle   11919 11755  0 12:26 pts/0    00:00:00 grep mongo


Thursday, May 24, 2012

How to display Oracle PL/SQL "OUT" variable in perl


Step 1: Create a procedure in oracle with OUT variable

create procedure testing (timendate out varchar2)
as
begin
select systimestamp into timendate from dual;
end;
/

Note: Above proc outputs systimestamp.

Step 2: Test the proc output in Oracle

set serveroutput on
declare
time varchar2(200);
begin
testing(time);
dbms_output.put_line('Returned: ' ||time);
end;
/

The output should be like:
Returned: 23-MAY-12 03.20.00.502124 PM -07:00

PL/SQL procedure successfully completed.

Step 3: Get this output from a perl script


#!/usr/bin/perl


#Adding the Oracle DBD Perl modules
use DBI;
use DBD::Oracle;


#Defining the connection to the Oracle DB
my $dbh = DBI->connect("dbi:Oracle:host=hostname;port=1521;sid=test","system","oracle")
or die "Error: " . $DBI->errstr;


#Perl variable for storing the PL SQL Proc output.
my $proc_out;


#Preparing the Oracle Statement to be execute. ":proc_out" is the oracle variable.
my $sql = "BEGIN testing(:proc_out); END;";
my $stm = $dbh->prepare($sql);


#bind_param_inout takes a REFERENCE to program variable as an argument. It passes the pointer to variable to Oracle, not the variable itself


$stm->bind_param_inout(":proc_out", \$proc_out,400);


#execute the SQL Statement
$stm->execute()
  or die "Database Error: " . $sth->errstr;


#Print the output
print "Current System Time Is: $proc_out\n";




Output should be like:

Current System Time Is: 23-MAY-12 03.25.55.364674 PM -07:00


Detailed reference from http://www.dba-oracle.com/t_dbi_perl_bind_calls.htm

Saturday, May 19, 2012

Install & Configure Perl DBD for Oracle 11.2 on Redhat Linux


Steps 1: Install Oracle 11.2 server/client on the host

You can download oracle from download.oracle.com

Steps 2: Install Perl on the host.

Check if perl is already installed on the host:
# perl -v
Most hosts have perl already installed.
You can download and install perl from http://www.perl.org/get.html.

Step 3: Download PERL DBD-Oracle 


Download link: http://search.cpan.org/~pythian/DBD-Oracle-1.44/

Step 4: unzip and untar the download DBD-Oracle

[oracle@host1 tmp]$ gunzip DBD-Oracle-1.44.tar.gz
[oracle@host1 tmp]$ tar -xvf DBD-Oracle-1.44.tar

Step 5: Create file oci.conf 

Create file oci.conf at /etc/ld.so.conf.d/ as root with the location of Oracle LD_LIBRARY_PATH

For this example:
ORACLE_HOME=/u01/product/11.2
PATH=$PATH:$ORACLE_HOME/bin
LD_LIBRARY_PATH=/u01/product/11.2/lib

[root@host1 ~]$ more /etc/ld.so.conf.d/oci.conf
/u01/product/11.2/lib

Run ldconfig to update ld.so.conf

[root@host1 ld.so.conf.d]# ldconfig -v

What is ldconfig (from the man pages)
"DESCRIPTION: ldconfig  creates  the  necessary links and cache to the most recent shared libraries found in the directories specified on the command line, in the file /etc/ld.so.conf, and in the trusted directories (/lib and /usr/lib).  The cache is used by the run-time linker, ld.so or ld-linux.so.  ldconfig checks the header and filenames of the libraries it encounters when determining which versions should have their links updated."
For more details: http://linux.die.net/man/8/ldconfig

Step 6: Install DBD-Oracle

Go to the directory where u untared the downloaded DBD-Oracle
Note: Make sure u have completed Step 5

[root@host1 DBD-Oracle-1.44]# perl Makefile.PL -V 11.2.0
[root@host1 DBD-Oracle-1.44]# make install

This should complete the DBD-Oracle installation!!!!!

Step 7: Test the install

Login back as Oracle user:
Create a script (dbd_oracle_test.pl) with the text below:


#!/usr/bin/perl


$host="host1";
$ora_listener="LISTENER";
$oracle_sid="test";
$listener_port="1521";
$ora_user="system";
$ora_password="oracle";
$db_table="dba_users";


use DBI;
use DBD::Oracle;


my $dbh = DBI->connect("dbi:Oracle:host=$host;port=$listener_port;sid=$oracle_sid",$ora_user, $ora_password)
  or die "Error Connecting to Oracle : " . DBI->errstr;


my $stm = $dbh->prepare("SELECT username,created FROM $db_table")
  or die "Database Error: " . $dbh->errstr;


$stm->execute()
  or die "Database Error: " . $sth->errstr;


while (( $username,$created ) = $stm->fetchrow_array() )
{
print "\n";
print " Username: $username\n";
print " Created Date: $created\n";
print "\n";
}


$stm->finish;


$dbh->disconnect;