GNU screen Command

The screen utility can be used to start a session that will be persistent if the terminal is lost, plus it also provides the ability to login from another computer simply by attaching to the screen session name.

 
Here are some essential commands to know:

Start new session with a name and a log file

screen -S <name> -L

List running sessions

screen -ls

Attach to a running session

screen -r <name>

Detach

<ctl> a d

Detach, ReAttach/create

screen -d -R <name>

 
 
The “screen -S <name> -L” command will create a session with the specified name and also create a logfile with a naming convention of screenlog.<#>.
 
Once you have your session open, you can start a long running process, and then exit the session via the detach command.
 
When you are ready to reconnect, you can do so using the “screen -r <name>” command.  If you didn’t specify a name, or forgot the name, you can run the “screen -ls” to show a listing of the screen sessions.
 
 
 

GoldenGate 12 – Classic, Coordinated and Integrated Replicat

I. New Replicat Modes Availalble in GolgenGate 12.1

Prior to the release of version 12.1, the replicat applied transactions to the target database serially.  To achieve parallelism, tables could be split across multiple replicats to get data applied quicker and reduce lag.  Drawbacks to this approach was the manual configuration of multiple replicats, trail files, and the limitation of not be able to split tables that had foreign key relationships across replicats.

The new Coordinated and Integrated Replicats can help you achieve transactions being applied in parallel on the target database without having to manually split out tables into multiple replicats.

Let’s take a look at the difference in the replicat modes.

A. Classic Replicat

Classic Replicat

Applies SQL serially.

Must be used if the database version is < 11.2.0.4

B. Coordinated Replicat

Coordinated Replicat

The configured replicat is called the coordinator.  It starts additional replicats based on number of threads specified.  The replicat name is limited to 5 characters and the threaded replicats will have this name appended with a 3 digit number ( RGGMA, RGGMA001, RGGMA002, … ).  The coordinator sends SQL to the threads, which can be applied in parallel.  The SQL is applied in committed order.

C. Integrated Replicat

Integrated Replicat

  • Constructs logical change records (LCR) that represent source database DML transactions (in committed order). DDL is applied directly by Replicat.  Transactions can be applied in parallel.
  •  Attaches to a background process in the target database known as a database inbound server by means of a lightweight streaming interface
  •  Transmits the LCRs to the inbound server, which applies the data to the target database

II. Configuration

Let’s setup one extract that will feed a heartbeat table into the three different replicat types. Here a diagram.

GG12 Replicat Modes

GG12 Replicat Modes

A. Parameter Files

Integrated Capture

EXTRACT cgga
SETENV (ORACLE_HOME="/u01/app/oracle/product/11.2.0.4")
SETENV (ORACLE_SID="ORA11204")
SETENV (NLS_LANG="AMERICAN_AMERICA.WE8MSWIN1252")
USERID c##ogg, PASSWORD ogg
TRANLOGOPTIONS MININGUSER c##ogg, MININGPASSWORD ogg
TRANLOGOPTIONS INTEGRATEDPARAMS (max_sga_size 200, parallelism 1)
LOGALLSUPCOLS
DISCARDFILE ./dirrpt/CGGA.dsc, APPEND Megabytes 100M
INCLUDE ./dirprm/gg_hb_capture.mac
EXTTRAIL ./dirdat/GA
#gg_hb_capture ();

Pump

EXTRACT pgga
SETENV (ORACLE_HOME="/u01/app/oracle/product/11.2.0.4")
SETENV (ORACLE_SID="ORA11204")
SETENV (NLS_LANG="AMERICAN_AMERICA.WE8MSWIN1252")
USERID c##ogg, PASSWORD ogg
DISCARDFILE ./dirrpt/PGGA.dsc, APPEND Megabytes 100M
RMTHOST db1.vm.com, MGRPORT 7812
INCLUDE ./dirprm/gg_hb_pump.mac
--
RMTTRAIL ./dirdat/a1
NOPASSTHRU
#gg_hb_pump ();
--
RMTTRAIL ./dirdat/a2
NOPASSTHRU
#gg_hb_pump ();
--
RMTTRAIL ./dirdat/a3
NOPASSTHRU
#gg_hb_pump ();

Classic Replicat

REPLICAT rgga1
SETENV (ORACLE_HOME="/u01/app/oracle/product/12.1.0.1")
SETENV (ORACLE_SID="ORA12101")
SETENV (NLS_LANG="AMERICAN_AMERICA.WE8MSWIN1252")
UserId c##ogg@pdbogg, Password ogg
ASSUMETARGETDEFS
DBOPTIONS NOSUPPRESSTRIGGERS
DBOPTIONS DEFERREFCONST
DISCARDFILE ./dirrpt/RGGA1.dsc, append, megabytes 100
INCLUDE ./dirprm/gg_hb_classic_replicat.mac
#gg_hb_classic_replicat ();

Coordinated Replicat

REPLICAT rgga2
SETENV (ORACLE_HOME="/u01/app/oracle/product/12.1.0.1")
SETENV (ORACLE_SID="ORA12101")
SETENV (NLS_LANG="AMERICAN_AMERICA.WE8MSWIN1252")
UserId c##ogg@pdbogg, Password ogg
ASSUMETARGETDEFS
DBOPTIONS NOSUPPRESSTRIGGERS
DBOPTIONS DEFERREFCONST
DISCARDFILE ./dirrpt/RGGA2.dsc, append, megabytes 100
INCLUDE ./dirprm/gg_hb_coordinated_replicat.mac
#gg_hb_coordinated_replicat (1);

Integrated Replicat

REPLICAT rgga3
SETENV (ORACLE_HOME="/u01/app/oracle/product/12.1.0.1")
SETENV (ORACLE_SID="ORA12101")
SETENV (NLS_LANG="AMERICAN_AMERICA.WE8MSWIN1252")
UserId c##ogg@pdbogg, Password ogg
ASSUMETARGETDEFS
DBOPTIONS NOSUPPRESSTRIGGERS
DBOPTIONS DEFERREFCONST
DISCARDFILE ./dirrpt/RGGA3.dsc, append, megabytes 100
INCLUDE ./dirprm/gg_hb_integrated_replicat.mac
#gg_hb_integrated_replicat ();

Macros

$ cat gg_hb_classic_replicat.mac
MACRO #gg_hb_classic_replicat
BEGIN
MAP ggmon.ggmon_heartbeat, TARGET ggmon.classic_heartbeat, &amp;
 COLMAP (USEDEFAULTS &amp;
 , SOURCE_COMMIT = @GETENV ('GGHEADER', 'COMMITTIMESTAMP') &amp;
 , TARGET_COMMIT = @DATE ('YYYY-MM-DD HH:MI:SS.FFFFFF','JTS',@GETENV ('JULIANTIMESTAMP')) &amp;
 , CAP_SOURCE_COMMIT = @TOKEN('CAP_SOURCE_COMMIT') &amp;
 , CAPGROUP = @TOKEN ('CAPGROUP') &amp;
 , CAPTIME = @TOKEN ('CAPTIME') &amp;
 , PMPGROUP = @TOKEN ('PMPGROUP') &amp;
 , PMPTIME = @TOKEN ('PMPTIME') &amp;
 , REPGROUP = @GETENV ('GGENVIRONMENT', 'GROUPNAME') &amp;
 , REPTIME = @DATE ('YYYY-MM-DD HH:MI:SS.FFFFFF','JTS',@GETENV ('JULIANTIMESTAMP')) &amp;
 );
MAP ggmon.ggmon_heartbeat, TARGET ggmon.classic_heartbeat_history, &amp;
 INSERTALLRECORDS &amp;
 , COLMAP (USEDEFAULTS &amp;
 , SOURCE_COMMIT = @GETENV ('GGHEADER', 'COMMITTIMESTAMP') &amp;
 , TARGET_COMMIT = @DATE ('YYYY-MM-DD HH:MI:SS.FFFFFF','JTS',@GETENV ('JULIANTIMESTAMP')) &amp;
 , CAP_SOURCE_COMMIT = @TOKEN('CAP_SOURCE_COMMIT') &amp;
 , CAPGROUP = @TOKEN ('CAPGROUP') &amp;
 , CAPTIME = @TOKEN ('CAPTIME') &amp;
 , PMPGROUP = @TOKEN ('PMPGROUP') &amp;
 , PMPTIME = @TOKEN ('PMPTIME') &amp;
 , REPGROUP = @GETENV ('GGENVIRONMENT', 'GROUPNAME') &amp;
 , REPTIME = @DATE ('YYYY-MM-DD HH:MI:SS.FFFFFF','JTS',@GETENV ('JULIANTIMESTAMP')) &amp;
 );
END;

$ cat gg_hb_coordinated_replicat.mac
MACRO #gg_hb_coordinated_replicat
PARAMS (#thread)
BEGIN
MAP ggmon.ggmon_heartbeat, TARGET ggmon.coordinated_heartbeat, THREAD (#thread) &amp;
 , &amp;
 COLMAP (USEDEFAULTS &amp;
 , SOURCE_COMMIT = @GETENV ('GGHEADER', 'COMMITTIMESTAMP') &amp;
 , TARGET_COMMIT = @DATE ('YYYY-MM-DD HH:MI:SS.FFFFFF','JTS',@GETENV ('JULIANTIMESTAMP')) &amp;
 , CAP_SOURCE_COMMIT = @TOKEN('CAP_SOURCE_COMMIT') &amp;
 , CAPGROUP = @TOKEN ('CAPGROUP') &amp;
 , CAPTIME = @TOKEN ('CAPTIME') &amp;
 , PMPGROUP = @TOKEN ('PMPGROUP') &amp;
 , PMPTIME = @TOKEN ('PMPTIME') &amp;
 , REPGROUP = @GETENV ('GGENVIRONMENT', 'GROUPNAME') &amp;
 , REPTIME = @DATE ('YYYY-MM-DD HH:MI:SS.FFFFFF','JTS',@GETENV ('JULIANTIMESTAMP')) &amp;
 );
MAP ggmon.ggmon_heartbeat, TARGET ggmon.coordinated_heartbeat_history, THREAD (#thread) &amp;
 INSERTALLRECORDS &amp;
 , COLMAP (USEDEFAULTS &amp;
 , SOURCE_COMMIT = @GETENV ('GGHEADER', 'COMMITTIMESTAMP') &amp;
 , TARGET_COMMIT = @DATE ('YYYY-MM-DD HH:MI:SS.FFFFFF','JTS',@GETENV ('JULIANTIMESTAMP')) &amp;
 , CAP_SOURCE_COMMIT = @TOKEN('CAP_SOURCE_COMMIT') &amp;
 , CAPGROUP = @TOKEN ('CAPGROUP') &amp;
 , CAPTIME = @TOKEN ('CAPTIME') &amp;
 , PMPGROUP = @TOKEN ('PMPGROUP') &amp;
 , PMPTIME = @TOKEN ('PMPTIME') &amp;
 , REPGROUP = @GETENV ('GGENVIRONMENT', 'GROUPNAME') &amp;
 , REPTIME = @DATE ('YYYY-MM-DD HH:MI:SS.FFFFFF','JTS',@GETENV ('JULIANTIMESTAMP')) &amp;
 );
END;

$ cat gg_hb_integrated_replicat.mac
MACRO #gg_hb_integrated_replicat
BEGIN
MAP ggmon.ggmon_heartbeat, TARGET ggmon.integrated_heartbeat, &
  COLMAP (USEDEFAULTS &
  , SOURCE_COMMIT = @GETENV ('GGHEADER', 'COMMITTIMESTAMP') &
  , TARGET_COMMIT = @DATE ('YYYY-MM-DD HH:MI:SS.FFFFFF','JTS',@GETENV ('JULIANTIMESTAMP')) &
  , CAP_SOURCE_COMMIT = @TOKEN('CAP_SOURCE_COMMIT') &
  , CAPGROUP = @TOKEN ('CAPGROUP') &
  , CAPTIME = @TOKEN ('CAPTIME') &
  , PMPGROUP = @TOKEN ('PMPGROUP') &
  , PMPTIME = @TOKEN ('PMPTIME') &
  , REPGROUP = @GETENV ('GGENVIRONMENT', 'GROUPNAME') &
  , REPTIME =  @DATE ('YYYY-MM-DD HH:MI:SS.FFFFFF','JTS',@GETENV ('JULIANTIMESTAMP')) &
  );
MAP ggmon.ggmon_heartbeat, TARGET ggmon.integrated_heartbeat_history, &
  INSERTALLRECORDS &
  , COLMAP (USEDEFAULTS &
  , SOURCE_COMMIT = @GETENV ('GGHEADER', 'COMMITTIMESTAMP') &
  , TARGET_COMMIT = @DATE ('YYYY-MM-DD HH:MI:SS.FFFFFF','JTS',@GETENV ('JULIANTIMESTAMP')) &
  , CAP_SOURCE_COMMIT = @TOKEN('CAP_SOURCE_COMMIT') &
  , CAPGROUP = @TOKEN ('CAPGROUP') &
  , CAPTIME = @TOKEN ('CAPTIME') &
  , PMPGROUP = @TOKEN ('PMPGROUP') &
  , PMPTIME = @TOKEN ('PMPTIME') &
  , REPGROUP = @GETENV ('GGENVIRONMENT', 'GROUPNAME') &
  , REPTIME =  @DATE ('YYYY-MM-DD HH:MI:SS.FFFFFF','JTS',@GETENV ('JULIANTIMESTAMP')) &
  );
END;

B. Creating Extract and Replicats

Oracle 11.2.0.4 Database

DBLogin UserId c##ogg, Password ogg
MiningDBLogin UserId c##ogg, Password ogg
REGISTER EXTRACT cgga DATABASE
ADD    SchemaTranData ggmon

ADD EXTRACT cgga, INTEGRATED TRANLOG, BEGIN NOW, DESC "***** Integrated Capture *****"
ADD EXTTRAIL ./dirdat/GA, EXTRACT cgga

ADD EXTRACT pgga, EXTTRAILSOURCE ./dirdat/GA, desc "***** Pump *****"
ADD RMTTRAIL ./dirdat/a1, EXTRACT pgga
ADD RMTTRAIL ./dirdat/a2, EXTRACT pgga
ADD RMTTRAIL ./dirdat/a3, EXTRACT pgga


GGSCI (db1.vm.com) 9> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     RUNNING     CGGA        00:00:10      00:00:02
Description "***** Integrated Capture *****"
EXTRACT     RUNNING     PGGA        00:00:00      00:00:08
Description "***** Pump *****"

Oracle 12.1.0.1 Database

DBLogin UserId c##ogg, Password ogg
MiningDBLogin UserId c##ogg, Password ogg
DBLogin UserId c##ogg@pdbogg, Password ogg
-- Classic Replicat
ADD REPLICAT rgga1, ExtTrail ./dirdat/a1, checkpointtable pdbogg.c##ogg.checkpoint, desc "***** Classic Replicat *****"
-- Coordinated Replicat
ADD REPLICAT rgga2, Coordinated MaxThreads 2, ExtTrail ./dirdat/a2, checkpointtable pdbogg.c##ogg.checkpoint, desc "***** Coordinated Replicat *****"
-- Integrated Replicat
ADD REPLICAT rgga3, Integrated, ExtTrail ./dirdat/a3, desc "***** Integrated Replicat *****"


GGSCI (db1.vm.com) 8> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
REPLICAT    RUNNING     RGGA1       00:00:00      00:00:07
Description "***** Classic Replicat *****"
REPLICAT    RUNNING     RGGA2       00:00:00      00:00:07
Description "***** Coordinated Replicat *****"
REPLICAT    RUNNING     RGGA3       00:00:00      00:00:05
Description "***** Integrated Replicat *****"

C. Push Data Though via Heartbeat Table

After starting up the heartbeat table, we can query to see the data being replicated across each replicat type. The extract and replicat names can be seen in the output. Note that name of the coordinated replicat group has the three digit number appended to the original name.

Please keep in mind that this configuration was done on a small virtual machine, so the lag times are a bit high, but this blog was only about the functionality of the different replicat types.

GGMON:pdbogg> @all_rep_lag.sql
***********************************************************
Heartbeat
***********************************************************

REPLICAT    CAP                                 LAST              CAP    PMP    REP  TOTAL
TYPE        DB NAME  CAPGROUP PMPGROUP REPGROUP UPDATE            LAG    LAG    LAG    LAG
----------- -------- -------- -------- -------- -------------- ------ ------ ------ ------
Classic     ORA11204 CGGA     PGGA     RGGA1    03/06 02:42:31    5.9    5.3    2.3   13.5
Coordinated ORA11204 CGGA     PGGA     RGGA2001 03/06 02:42:31    5.9    5.3    3.0   14.2
Integrated  ORA11204 CGGA     PGGA     RGGA3    03/06 02:42:31    5.9    5.3   12.6   23.8

***********************************************************
Heartbeat History
***********************************************************

                                                   AVG    AVG    AVG    AVG
REPLICAT    CAP                                    CAP    PMP    REP  TOTAL
TYPE        DB NAME  CAPGROUP PMPGROUP REPGROUP    LAG    LAG    LAG    LAG
----------- -------- -------- -------- -------- ------ ------ ------ ------
Classic     ORA11204 CGGA     PGGA     RGGA1       3.6    5.4    4.1   13.2
Coordinated ORA11204 CGGA     PGGA     RGGA2001    3.6    5.4    2.5   11.6
Integrated  ORA11204 CGGA     PGGA     RGGA3       3.6    5.4   10.0   19.1

A copy of the all_rep_lag.sql script can be found here .

GoldenGate 12 Users, Privileges and Security on Multitenant DBs

I. Users and Privileges

With the release of the latest versions of GG and database, there have been many modifications that require changes in how users are created, logins and the privileges granted.

A. Database Users

If the source database is a multitenant container database, the Extract user must be a common user and must log into the root container.

One extract capture process can capture from multiple pluggable databases to a single trail.

A replicat process can apply transactions to only one pluggable database.

B. Privileges

Click the following link to see a list of privileges required for a GoldenGate user on a 11.2.0.4 or greater database.

Users Privileges for GolgenGate Users for DB 11.2.0.4 or Higher

CREATE USER c##ogg IDENTIFIED BY ogg
DEFAULT TABLESPACE ogg_tbsp
TEMPORARY TABLESPACE temp
PROFILE DEFAULT ACCOUNT UNLOCK
CONTAINER=ALL;

ALTER USER c##ogg DEFAULT ROLE ALL CONTAINER=ALL;
ALTER USER c##ogg QUOTA UNLIMITED ON ogg_tbsp CONTAINER=ALL;

GRANT CREATE SESSION TO c##ogg CONTAINER=ALL;
GRANT CONNECT TO c##ogg CONTAINER=ALL;
GRANT RESOURCE TO c##ogg CONTAINER=ALL;
GRANT ALTER SYSTEM TO c##ogg CONTAINER=ALL;
GRANT DBA TO c##ogg CONTAINER=ALL;
GRANT SELECT ANY TRANSACTION TO c##ogg CONTAINER=ALL;
GRANT ALTER ANY TABLE TO c##ogg CONTAINER=ALL;

BEGIN
 dbms_goldengate_auth.grant_admin_privilege
 (
 grantee => 'C##OGG',
 privilege_type => '*',
 grant_select_privileges => TRUE,
 do_grants => TRUE,
 container => 'ALL'
 );
END;
/

C. Logins

1. Capture DB Login

DBLogin UserId c##ogg, Password ogg

2. Pluggable DB Login

DBLogin UserId c##ogg@pdbogg, Password ogg

II. Security

A. Credential Store

The credential store is used to maintain encrypted database passwords, associating an alias with the userid.

ADD CREDENTIALSTORE

ALTER CREDENTIALSTORE ADD USER c##ogg PASSWORD ogg ALIAS ogg_user

ALTER CREDENTIALSTORE ADD USER c##ogg@pdbogg PASSWORD ogg ALIAS pdbogg_user

B. Logins

1. Capture DB Login

DBLOGIN UserIdAlias ogg_user

2. Pluggable DB Login

DBLOGIN UserIdAlias pdbogg_user

Mapping problem with compressed update record – Key column missing from update

Here’s an example of an update statement failing on the target database on a table without a primary or unique key.

Since there are no keys defined on the table, the replicat will consider all the columns as key columns.

The update statement that was processed by the extract only logged the column values that were changed along with the columns defined by the keycols parameter in the table statement, in this case the objectid column.

The replicat ABENDed since it did not have values for allt he columns. Here’s the discard file which show the specifics on the columns and the error message.

Oracle GoldenGate Delivery for Oracle process started, group RWPC discard file opened: 2013-09-08 11:21:51

Key column ROV_RNAME (354) is missing from update on table WPC.WPCTBL
Key column A_RCPT_DATE (355) is missing from update on table WPC.WPCTBL
Key column A_SCAN_TIME (356) is missing from update on table WPC.WPCTBL
Key column DOCTYPE (357) is missing from update on table WPC.WPCTBL
Key column ORIGDATE (358) is missing from update on table WPC.WPCTBL
Key column STRACKING (359) is missing from update on table WPC.WPCTBL
Key column M_IND (360) is missing from update on table WPC.WPCTBL
Key column NCOB (361) is missing from update on table WPC.WPCTBL
Key column RET_CBT (362) is missing from update on table WPC.WPCTBL
Key column M_COUNTER (363) is missing from update on table WPC.WPCTBL
Key column IS_DOC (364) is missing from update on table WPC.WPCTBL
Missing 11 key columns in update for table WPC.WPCTBL.
Current time: 2013-09-08 11:21:53
Discarded record from action ABEND on error 0

Aborting transaction on /u01/app/oracle/product/golden_gate/dirdat/wP beginning at seqno 76 rba 695700
                         error at seqno 76 rba 973932
Problem replicating WPC.WPCTBL to WPC.WPCTBL
Mapping problem with compressed update record (target format)...
*
CATEGORYID = 3000
OBJECTID = 1892683845

Doing some research on My Oracle Support turned up the following two notes that were used to come up with a solution.

Replicat Abending With Mapping Error and discard file shows Missing Key Columns (Doc ID 1276538.1)
Can I Use A Wildcard In For All Of The Tables Except One so that I may describe it separately? (Doc ID 970419.1)

Two parameters were put in the replicat parameter file to workaround the issue.

DYNAMICRESOLUTION - enables fast process startup when there are numerous tables specified in TABLE or MAP statements

WILDCARDRESOLVE IMMEDIATE - objects are processed at startup

Here’s a snippet of the replicat parameter file with the new mappings. The replicat is able to process the transaction after the change since the KEYCOLS parameter specified objectid to be the table’s key column. Hence, GG updates the columns in the trail, leaving those mentioned before in the discard file unchanged.

> cat rwpc.prm 
REPLICAT rwpc
...
-- ------------------------------------------------------------------
-- mappings
-- ------------------------------------------------------------------
WILDCARDRESOLVE IMMEDIATE
MAP WPC.WPCTBL, TARGET WPC.WPCTBL, KEYCOLS (objectid);
DYNAMICRESOLUTION
MAPEXCLUDE WPC.WPCTBL;
MAP WPC.*, TARGET WPC.*;

GoldenGate Process Memory Usage

From the GG Installation documentation:

Each Extract and Replicat process needs approximately 25-55 MB of memory, or more depending on the size of the transactions and the number of concurrent transactions.

So let’s see some example of how much memory is being used. I wrote a short script to show the memory usage by using “ps” and have two examples, one on Linux and one on Solaris.

The first example ( on Linux ) shows the usage of GoldenGate extract and replicat processes configured to process a heartbeat table, so not very much activity.

The second example ( on Solaris ) show the usage of GoldenGate processes on a live system with lots of activity.

Due to differences on how ps behaves between Linux and Solaris, I display either the group name itself, or the parameter file of the group, which contains the group name. The unix script can be found at the bottom of this blog.

Here’s the example running on Linux. This configuration does not have very many transactions.


> ./gg_memory_usage.sh
OSNAME = Linux

#####################################
# Individual extract Process Usage #
#####################################
48.3516 MB CGGMONX
48.6094 MB CGGMONY
31.6055 MB PGGMONX
31.6055 MB PGGMONY

#####################################
#   Total extract Process Usage    #
#####################################
Number of processes      = 4
AVG Memory usage/process = 40.043 MB
Total memory usage       = 160.172  MB

#####################################
# Individual replicat Process Usage #
#####################################
29.8398 MB RGGMONX
29.832 MB RGGMONY

#####################################
#   Total replicat Process Usage    #
#####################################
Number of processes      = 2
AVG Memory usage/process = 29.8359 MB
Total memory usage       = 59.6719  MB



Here’s the example running on Solaris. This configuration has processed over 11,000 transactions per second at peak times, but normally runs around 5000 transactions per second on average.


oracle@sadcsim1db1[CSOTADB1] $ ./gg_memory_usage.sh
OSNAME = SunOS

#####################################
# Individual extract Process Usage #
#####################################
7565.41 MB /ggs/dirprm/e1.prm
5435.1 MB /ggs/dirprm/e2.prm
160.758 MB /ggs/dirprm/p1.prm
156.758 MB /ggs/dirprm/p2.prm
164.758 MB /ggs/dirprm/p3.prm
160.758 MB /ggs/dirprm/p4.prm
160.758 MB /ggs/dirprm/p5.prm
156.758 MB /ggs/dirprm/p6.prm
164.758 MB /ggs/dirprm/p7.prm
160.758 MB /ggs/dirprm/p8.prm
160.758 MB /ggs/dirprm/p9.prm
156.758 MB /ggs/dirprm/p10.prm
164.758 MB /ggs/dirprm/p11.prm
160.758 MB /ggs/dirprm/p12.prm
160.758 MB /ggs/dirprm/p13.prm
156.758 MB /ggs/dirprm/p14.prm
164.758 MB /ggs/dirprm/p15.prm
160.758 MB /ggs/dirprm/p16.prm

#####################################
#   Total extract Process Usage    #
#####################################
Number of processes      = 18
AVG Memory usage/process = 865.147 MB
Total memory usage       = 15572.6  MB

#####################################
# Individual replicat Process Usage #
#####################################
148.688 MB /ggs/dirprm/r1.prm
144.688 MB /ggs/dirprm/r2.prm
144.688 MB /ggs/dirprm/r3.prm
148.688 MB /ggs/dirprm/r4.prm
144.688 MB /ggs/dirprm/r5.prm
144.688 MB /ggs/dirprm/r6.prm
144.688 MB /ggs/dirprm/r7.prm
144.688 MB /ggs/dirprm/r8.prm
144.688 MB /ggs/dirprm/r9.prm
144.688 MB /ggs/dirprm/r10.prm
144.688 MB /ggs/dirprm/r11.prm
144.688 MB /ggs/dirprm/r12.prm
148.688 MB /ggs/dirprm/r13.prm
144.688 MB /ggs/dirprm/r14.prm
144.688 MB /ggs/dirprm/r15.prm
148.688 MB /ggs/dirprm/r16.prm
144.688 MB /ggs/dirprm/r17.prm
144.688 MB /ggs/dirprm/r18.prm
144.688 MB /ggs/dirprm/r19.prm
144.688 MB /ggs/dirprm/r20.prm
144.688 MB /ggs/dirprm/r21.prm
144.688 MB /ggs/dirprm/r22.prm
144.688 MB /ggs/dirprm/r23.prm
144.688 MB /ggs/dirprm/r24.prm
148.688 MB /ggs/dirprm/r25.prm
144.688 MB /ggs/dirprm/r26.prm
144.688 MB /ggs/dirprm/r27.prm
148.68 MB /ggs/dirprm/r28.prm
144.68 MB /ggs/dirprm/r29.prm
144.68 MB /ggs/dirprm/r30.prm
144.68 MB /ggs/dirprm/r31.prm
144.68 MB /ggs/dirprm/r32.prm
144.68 MB /ggs/dirprm/r33.prm
144.68 MB /ggs/dirprm/r34.prm
144.68 MB /ggs/dirprm/r35.prm
144.68 MB /ggs/dirprm/r36.prm

#####################################
#   Total replicat Process Usage    #
#####################################
Number of processes      = 36
AVG Memory usage/process = 145.352 MB
Total memory usage       = 5232.68  MB

Big difference! If you have a busy system, make sure you have memory. You’ll probably have to do testing since the memory requirements will vary depending on both the number of transactions and the size of the transactions.

Here’s the script :


#!/bin/bash

###############################
# determine the OS type
###############################
OSNAME=`uname`

case "$OSNAME" in
  "SunOS")
    echo "OSNAME = $OSNAME"
    ;;
  "Linux")
    echo "OSNAME = $OSNAME"
    ;;
  "*")
    echo "This script has not been verified on $OSNAME"
    exit 1
    ;;
esac

###############################
# set the temp file
###############################
TMPFILE=/tmp/pmem.tmp
if [ -f $TMPFILE ]
then
  rm -f $TMPFILE
fi

################################
# loop over the gg process types
################################
PROCESSES="extract replicat"

for PROCESS in $PROCESSES
do
  FLAG=""
  FLAG=`ps -ef | grep $PROCESS`
  if [ -z "FLAG" ]
  then
    echo "No $PROCESS processes found"
  else
    echo
    echo "#####################################"
    echo "# Individual $PROCESS Process Usage #"
    echo "#####################################"
    case "$OSNAME" in
      "Linux")
        ps -C $PROCESS -O rss > $TMPFILE
        cat $TMPFILE | grep $PROCESS | awk '{print $2/1024, "MB", $12}' | sort -k 2
        ;;
      "SunOS")
        ps -efo vsz,uid,pid,ppid,pcpu,args | grep -v grep | grep $PROCESS > $TMPFILE
        cat $TMPFILE | grep $PROCESS | awk '{print $1/1024, "MB", $8}' | sort -k 2
        ;;
      "*")
        echo "This script has not been verified on $OSNAME"
        exit 1
        ;;
    esac
    rm -f $TMPFILE

    echo
    echo "#####################################"
    echo "#   Total $PROCESS Process Usage    #"
    echo "#####################################"
    case "$OSNAME" in
      "Linux")
        ps -C $PROCESS -O rss > $TMPFILE
        cat $TMPFILE | grep $PROCESS | awk '{count ++; sum=sum+$2; } END \
          { print "Number of processes      =",count; \
          print "AVG Memory usage/process =",sum/1024/count, "MB"; \
          print "Total memory usage       =", sum/1024,  " MB"}'
        ;;
      "SunOS")
        ps -efo vsz,uid,pid,ppid,pcpu,comm | grep -v grep | grep $PROCESS > $TMPFILE
        cat $TMPFILE | awk '{count ++; sum=sum+$1; } END \
          { print "Number of processes      =",count; \
          print "AVG Memory usage/process =",sum/1024/count, "MB"; \
          print "Total memory usage       =", sum/1024,  " MB"}'
        ;;
      "*")
        echo "This script has not been verified on $OSNAME"
        exit 1
        ;;
    esac
    rm -f $TMPFILE
  fi
done

exit 0

GoldenGate 12.1.2 Replication from SQL Server to Oracle

This is a post using GoldenGate 12.1.2 from a SQL Server source database to an Oracle 11gR2 database.

What this tutorial will cover:

1. OGG installation on Windows
2. OGG installation on Linux using the graphical installer
3. Direct Load method to initially copy the data from SQL Server to Oracle
4. Configuration of the normal live replication from source to target

This tutorial does not cover the installation of SQL Server 2008R2, but does show the pre-requites for using OGG with SQL Server as well as the certification matrix of OGG 12.1.2 .

I’ve included lots of screenshots for the install to hopefully make things a bit easier to follow. You can download it from here .

Good Luck!

GoldenGate Extract Pump process keeps ABENDing – OGG-01031

I encountered an odd situation recently where the extract pump processes kept ABENDing. The report file listed the OGG-01031 error message.

GGSCI (exadb04.new.fastrieve.com) 23> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     CBOA        00:00:00      00:00:02    
EXTRACT     RUNNING     CGGMONX     00:00:01      00:00:05    
EXTRACT     ABENDED     PBOA        00:00:00      01:46:40    
EXTRACT     ABENDED     PGGMONX     00:00:00      01:43:14    
REPLICAT    RUNNING     RGGMONY     00:00:00      00:03:24    


GGSCI (exadb04.new.fastrieve.com) 1> view report pggmonx


***********************************************************************
                 Oracle GoldenGate Capture for Oracle
    Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO
   Linux, x64, 64bit (optimized), Oracle 11g on Apr 23 2012 08:42:16
 
Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.


                    Starting at 2013-08-14 16:15:12
***********************************************************************
.
.
.

2013-08-14 16:15:28  ERROR   OGG-01031  There is a problem in network communication, a remote file problem,
encryption keys for target and source do not match (if using ENCRYPT) or an unknown error. (Reply received
is Unable to open file "./dirdat/gX000017" (error 11, Resource temporarily unavailable)).

I wasn’t able to find very much on this issue, but I did run across MOS note 1318980.1, and it turned out to have the fix in my case. By renaming the associated $GG_HOME/dirchk/*.cps and $GG_HOME/dirchk/*.cpe files, I was able to get the processes running again. Note that in this case the $GG_HOME/dirchk directory resides on an NFS mount. Apparently this type of error can occur on shared file systems causing the $GG_HOME/dirchk entries to be non-updatable.

Extract/Replicat Shows Already Running When Started and Already Stopped When Stopped from GGSCI, OGG-01031, Reply received is Unable to open file (Doc ID 1318980.1)

[exadb04:oracle:FTRVXP4] /u01/app/oracle/product/golden_gate/dirchk 
> ls -lhF
total 696K
-rw-rw-r-- 1 oracle dba  72K Aug 14 16:28 PGGMONX.cpe
-rw-rw-r-- 1 oracle dba 2.2K Aug 14 16:27 PGGMONX.cps

[exadb04:oracle:FTRVXP4] /u01/app/oracle/product/golden_gate/dirchk 
> mv PGGMONX.cpe PGGMONX.cpe_new

[exadb04:oracle:FTRVXP4] /u01/app/oracle/product/golden_gate/dirchk 
> mv PGGMONX.cps PGGMONX.cps_new

[exadb04:oracle:FTRVXP4] /u01/app/oracle/product/golden_gate/dirchk 
> ls -lhF
total 696K
-rw-rw-r-- 1 oracle dba  72K Aug 14 16:28 PGGMONX.cpe_new
-rw-rw-r-- 1 oracle dba 2.2K Aug 14 16:27 PGGMONX.cps_new

[exadb04:oracle:FTRVXP4] /u01/app/oracle/product/golden_gate/dirchk 
> mv PGGMONX.cpe_new PGGMONX.cpe

[exadb04:oracle:FTRVXP4] /u01/app/oracle/product/golden_gate/dirchk 
> mv PGGMONX.cps_new PGGMONX.cps

[exadb04:oracle:FTRVXP4] /u01/app/oracle/product/golden_gate/dirchk 
> ls -lhF
total 696K
-rw-rw-r-- 1 oracle dba  72K Aug 14 16:28 PGGMONX.cpe
-rw-rw-r-- 1 oracle dba 2.2K Aug 14 16:27 PGGMONX.cps

After restart the processes, everything was caught up in a couple of minutes.

GGSCI (exadb04.new.fastrieve.com) 11> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     CBOA        00:00:07      00:00:04    
Description "***** Capture for Bank of America *****"
EXTRACT     RUNNING     CGGMONX     00:02:21      00:00:01    
Description "***** Capture for GGMON X *****"
EXTRACT     RUNNING     PBOA        00:00:00      00:00:05    
EXTRACT     RUNNING     PGGMONX     00:00:00      00:00:08    
Description "***** Pump for GGMON X *****"
REPLICAT    RUNNING     RGGMONY     00:00:00      00:00:02    

Exadata Flashdisk in Poor Performance Status After Replacement

I recently encountered a bad flashdisk on an Exadata X2-2 machine and had it replaced. That is simple enough, just open a ticket with Oracle and schedule the hardware replacement. However, after the replacement two separate modules went into a “poor performance” state. Oracle support then advised me to follow the note listed below to fix it.

Flash Disks may report ‘Not Present’ or ‘Poor Performance’ after FDOM/Flash Disk Replacement [ID 1306635.1]

The note mentions that it covers Version 11.2.1.2.1 to 11.2.2.2.0 [Release 11.2], but I was on Version 11.2.3.2.1 .

A couple of notes about 1306635.1 as I encountered two errors with the statements.

1) “drop celldisk all flashdisk” command is out of order; “drop flashlog” needs to come first.
2) “drop celldisk all flashlog” is not a valid command

Please be sure to read the entire note as some additional commands may be required depending on the storage software version and whether or not you are in WriteBack or WriteThrough mode.

Here’s the original error with the part that was replaced; FLASH_4_1 reported as critical.

CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=flashdisk
FLASH_1_0 1206M0CY4H normal
FLASH_1_1 1206M0CYL2 normal
FLASH_1_2 1206M0CY43 normal
FLASH_1_3 1206M0CY44 normal
FLASH_2_0 1202M0CN7C normal
FLASH_2_1 1202M0CRLX normal
FLASH_2_2 1202M0CN6D normal
FLASH_2_3 1202M0CQXA normal
FLASH_4_0 1202M0CR8P normal
FLASH_4_1 1202M0CR7P critical
FLASH_4_2 1202M0CRLQ normal
FLASH_4_3 1202M0CMWU normal
FLASH_5_0 1202M0CRJT normal
FLASH_5_1 1202M0CRL1 normal
FLASH_5_2 1202M0CRLT normal
FLASH_5_3 1202M0CRLP normal

The flashdisk in question was successfully replaced ( FLASH_4_1 ) and it reported normal; however, two more issues popped up on flashdisks in the same PCI slot 4 immediately after replacing the flashdisk.

CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=flashdisk
FLASH_1_0 1206M0CY4H normal
FLASH_1_1 1206M0CYL2 normal
FLASH_1_2 1206M0CY43 normal
FLASH_1_3 1206M0CY44 normal
FLASH_2_0 1202M0CN7C normal
FLASH_2_1 1202M0CRLX normal
FLASH_2_2 1202M0CN6D normal
FLASH_2_3 1202M0CQXA normal
FLASH_4_0 1202M0CR8P normal
FLASH_4_1 1101M067DW normal
FLASH_4_2 1202M0CRLQ warning - poor performance
FLASH_4_3 1202M0CMWU warning - poor performance
FLASH_5_0 1202M0CRJT normal
FLASH_5_1 1202M0CRL1 normal
FLASH_5_2 1202M0CRLT normal
FLASH_5_3 1202M0CRLP normal

Following the note the support gave me, here’s the fix. As I mentioned before, please be sure to read the entire note as some additional commands may be required depending on the storage software version and whether or not you are in WriteBack or WriteThrough mode.

[root@exaxcel07 ~]# cellcli
CellCLI: Release 11.2.3.2.1 - Production on Tue Jun 04 13:22:11 CDT 2013

Copyright (c) 2007, 2012, Oracle.  All rights reserved.
Cell Efficiency Ratio: 6,291

CellCLI> alter lun 4_2 reenable force
LUN 4_2 on physical disk FLASH_4_2 successfully marked to status normal.LUN 4_2 successfully reenabled.

CellCLI> alter lun 4_3 reenable force
LUN 4_3 on physical disk FLASH_4_3 successfully marked to status normal.LUN 4_3 successfully reenabled.

CellCLI> drop flashcache
Flash cache exaxcel07_FLASHCACHE successfully dropped

CellCLI> drop flashlog
Flash log exaxcel07_FLASHLOG successfully dropped

CellCLI> drop celldisk all flashdisk
CellDisk FD_00_exaxcel07 successfully dropped
CellDisk FD_01_exaxcel07 successfully dropped
CellDisk FD_02_exaxcel07 successfully dropped
CellDisk FD_03_exaxcel07 successfully dropped
CellDisk FD_04_exaxcel07 successfully dropped
CellDisk FD_05_exaxcel07 successfully dropped
CellDisk FD_06_exaxcel07 successfully dropped
CellDisk FD_07_exaxcel07 successfully dropped
CellDisk FD_08_exaxcel07 successfully dropped
CellDisk FD_09_exaxcel07 successfully dropped
CellDisk FD_10_exaxcel07 successfully dropped
CellDisk FD_11_exaxcel07 successfully dropped
CellDisk FD_12_exaxcel07 successfully dropped
CellDisk FD_13_exaxcel07 successfully dropped
CellDisk FD_14_exaxcel07 successfully dropped
CellDisk FD_15_exaxcel07 successfully dropped

CellCLI> create celldisk all flashdisk
CellDisk FD_00_exaxcel07 successfully created
CellDisk FD_01_exaxcel07 successfully created
CellDisk FD_02_exaxcel07 successfully created
CellDisk FD_03_exaxcel07 successfully created
CellDisk FD_04_exaxcel07 successfully created
CellDisk FD_05_exaxcel07 successfully created
CellDisk FD_06_exaxcel07 successfully created
CellDisk FD_07_exaxcel07 successfully created
CellDisk FD_08_exaxcel07 successfully created
CellDisk FD_09_exaxcel07 successfully created
CellDisk FD_10_exaxcel07 successfully created
CellDisk FD_11_exaxcel07 successfully created
CellDisk FD_12_exaxcel07 successfully created
CellDisk FD_13_exaxcel07 successfully created
CellDisk FD_14_exaxcel07 successfully created
CellDisk FD_15_exaxcel07 successfully created

CellCLI> create flashlog all
Flash log exaxcel07_FLASHLOG successfully created

CellCLI> create flashcache all
Flash cache exaxcel07_FLASHCACHE successfully created

CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=flashdisk
	 FLASH_1_0	 1206M0CY4H	 normal
	 FLASH_1_1	 1206M0CYL2	 normal
	 FLASH_1_2	 1206M0CY43	 normal
	 FLASH_1_3	 1206M0CY44	 normal
	 FLASH_2_0	 1202M0CN7C	 normal
	 FLASH_2_1	 1202M0CRLX	 normal
	 FLASH_2_2	 1202M0CN6D	 normal
	 FLASH_2_3	 1202M0CQXA	 normal
	 FLASH_4_0	 1202M0CR8P	 normal
	 FLASH_4_1	 1101M067DW	 normal
	 FLASH_4_2	 1202M0CRLQ	 normal
	 FLASH_4_3	 1202M0CMWU	 normal
	 FLASH_5_0	 1202M0CRJT	 normal
	 FLASH_5_1	 1202M0CRL1	 normal
	 FLASH_5_2	 1202M0CRLT	 normal
	 FLASH_5_3	 1202M0CRLP	 normal

All is good now.

Exadata Auto Mgmt Process Dropping Disks

I was recently working on an Exadata X2-2 machine that had several disks that were not in ASM. Upon further inspection, the physical, celldisk, and griddisks were all fine, but there was an issue trying to add them back into ASM.

I’ll show you the steps provided by Oracle support to get the disks added back into the diskgroups in ASM.

Here, you can see the status of the disks on the storage cells and within ASM.

-- celldisk

name: CD_03_exaucel02
comment:
creationTime: 2011-01-05T16:42:36-06:00
deviceName: /dev/sdd
devicePartition: /dev/sdd
diskType: HardDisk
errorCount: 4
freeSpace: 0
id: 0000012d-5858-9649-0000-000000000000
interleaving: none
lun: 0_3
physicalDisk: L2HTLY
raidLevel: 0
size: 1861.703125G
status: normal

-- griddisk

name: DATA_CD_03_exaucel02
asmDiskgroupName: DATA
asmDiskName: DATA_CD_03_EXAUCEL02
asmFailGroupName: EXAUCEL02
availableTo:
cachingPolicy: default
cellDisk: CD_03_exaucel02
comment:
creationTime: 2011-01-05T16:45:47-06:00
diskType: HardDisk
errorCount: 4
id: 0000012d-585b-811e-0000-000000000000
offset: 32M
size: 1562G
status: active

name: RECO_CD_03_exaucel02
asmDiskgroupName: RECO
asmDiskName: RECO_CD_03_EXAUCEL02
asmFailGroupName: EXAUCEL02
availableTo:
cachingPolicy: default
cellDisk: CD_03_exaucel02
comment:
creationTime: 2011-01-11T10:44:37-06:00
diskType: HardDisk
errorCount: 0
id: 90c29a38-f9d3-405e-8909-d79dcdf5a909
offset: 1562.046875G
size: 299.65625G
status: active


From V$ASM_DISK
SQL:ASM> @asm_info

GROUP_NUMBER FAILGROUP                      PATH                                     MOUNT_STATUS STATE
------------ ------------------------------ ---------------------------------------- ------------ --------
           0 EXAUCEL02                      o/192.168.10.6/DATA_CD_03_exaucel02      CLOSED       NORMAL
           0 EXAUCEL02                      o/192.168.10.6/RECO_CD_03_exaucel02      CLOSED       NORMAL
           0 EXAUCEL07	                    o/192.168.10.11/DATA_CD_08_elsucel07     IGNORED	  NORMAL
           0 EXAUCEL07		            o/192.168.10.11/RECO_CD_08_elsucel07     IGNORED	  NORMAL

The disks had been in this state for some time as the log files had already aged off, and since the disks seemed to be fine, I moved forward with Oracle support assuming that the disks were fine and to attempt to add the disks back into the diskgroups in ASM. For reference, you can read the following note for reference.

After replacing disk on Exadata storage, v$asm_disk shows CLOSED/IGNORED as mount_status [ID 1347155.1]

Unfortunately, we had to go through several attempts at getting the disks back into ASM. Our attempts included:

ATTEMPT 1
Run the commands to add the disk back to the diskgroup. I did not see any errors, however the mount_status changed from ignored to closed.
sql> alter diskgroup RECO add disk ‘o/192.168.10.6/RECO_CD_03_exaucel02’ force;
sql> alter diskgroup DATA add disk ‘o/192.168.10.6/DATA_CD_03_exaucel02’ force;

ATTEMPT 2
Next, we tried to add 2 of the disks to +DATA first as they need to be added in pairs to preserve the disk partnerships.
alter diskgroup data
add failgroup EXAUCEL04 disk ‘o/192.168.10.8/DATA_CD_10_exaucel04’ force
add failgroup EXAUCEL07 disk ‘o/192.168.10.11/DATA_CD_08_exaucel07’ force
rebalance power 11;

ATTEMPT 3
Next, we tried the following to clear the cache of the Exadata Auto Mgmt process.
On all DB nodes, identify the PIDs of xdmg and xdwk processes and kill them; then add the disks back into ASM.
ps -ef | grep xdmg
ps -ef | grep xdwk
Since the xd* processes are non-fatal background processes, killing it does not bring down the ASM instance; they will be automatically respawned. Once the xd* processes are back up, the DISKs were added back again.

— ————————————————————
— Always the same result,
— the disk was always dropped by the Exadata Auto Mgmt process
— The following is from the ASM alert log.
— ————————————————————

...
Starting background process XDWK
Sun May 05 02:27:30 2013
XDWK started with pid=40, OS id=28500
SQL> /* Exadata Auto Mgmt: Proactive DROP ASM Disk */
alter diskgroup DATA drop
disk DATA_CD_03_exauCEL02 force
...

The next logical step was to drop and re-create the celldisk and griddisks through the cellcli on the cell.
In order to do this, it is necessary to gather some info on the names and sizes of the disks as follows.
Please note that some of the output has been truncated for brevity.

As is in most cases, here’s a note that has the steps and their explanation.
Steps to manually create cell/grid disks on Exadata V2 if auto-create fails during disk replacement [ID 1281395.1]

[root@exaucel02 ~]# cellcli
CellCLI: Release 11.2.3.2.1 - Production on Wed May 29 13:12:33 CDT 2013

Copyright (c) 2007, 2012, Oracle.  All rights reserved.
Cell Efficiency Ratio: 1,955

CellCLI> list physicaldisk
	 28:0     	 L2KDEH    	 normal
	 28:1     	 L5KQS3    	 normal
	 28:2     	 L2KD6X    	 normal
	 28:3     	 L2HTLY    	 normal
	 28:4     	 L2HTB8    	 normal
	 28:5     	 L2KJAB    	 normal
	 28:6     	 L2KJ98    	 normal
	 28:7     	 L2KD54    	 normal
	 28:8     	 L2KD6Z    	 normal
	 28:9     	 L37G5R    	 normal
	 28:10    	 L2KD6V    	 normal
	 28:11    	 L2J1LM    	 normal
...

CellCLI> list lun
	 0_0 	 0_0 	 normal
	 0_1 	 0_1 	 normal
	 0_2 	 0_2 	 normal
	 0_3 	 0_3 	 normal
	 0_4 	 0_4 	 normal
	 0_5 	 0_5 	 normal
	 0_6 	 0_6 	 normal
	 0_7 	 0_7 	 normal
	 0_8 	 0_8 	 normal
	 0_9 	 0_9 	 normal
	 0_10	 0_10	 normal
	 0_11	 0_11	 normal
...

CellCLI> list celldisk
	 CD_00_exaucel02	 normal
	 CD_01_exaucel02	 normal
	 CD_02_exaucel02	 normal
	 CD_03_exaucel02	 normal
	 CD_04_exaucel02	 normal
	 CD_05_exaucel02	 normal
	 CD_06_exaucel02	 normal
	 CD_07_exaucel02	 normal
	 CD_08_exaucel02	 normal
	 CD_09_exaucel02	 normal
	 CD_10_exaucel02	 normal
	 CD_11_exaucel02	 normal
...

CellCLI> list griddisk
	 DATA_CD_00_exaucel02	 active
	 DATA_CD_01_exaucel02	 active
	 DATA_CD_02_exaucel02	 active
	 DATA_CD_03_exaucel02	 active
	 DATA_CD_04_exaucel02	 active
	 DATA_CD_05_exaucel02	 active
	 DATA_CD_06_exaucel02	 active
	 DATA_CD_07_exaucel02	 active
	 DATA_CD_08_exaucel02	 active
	 DATA_CD_09_exaucel02	 active
	 DATA_CD_10_exaucel02	 active
	 DATA_CD_11_exaucel02	 active
	 RECO_CD_00_exaucel02	 active
	 RECO_CD_01_exaucel02	 active
	 RECO_CD_02_exaucel02	 active
	 RECO_CD_03_exaucel02	 active
	 RECO_CD_04_exaucel02	 active
	 RECO_CD_05_exaucel02	 active
	 RECO_CD_06_exaucel02	 active
	 RECO_CD_07_exaucel02	 active
	 RECO_CD_08_exaucel02	 active
	 RECO_CD_09_exaucel02	 active
	 RECO_CD_10_exaucel02	 active
	 RECO_CD_11_exaucel02	 active

CellCLI> list physicaldisk where name=28:3 detail
	 name:              	 28:3
	 deviceId:          	 24
	 diskType:          	 HardDisk
	 enclosureDeviceId: 	 28
	 errMediaCount:     	 0
	 errOtherCount:     	 0
	 foreignState:      	 false
	 luns:              	 0_3
	 makeModel:         	 "SEAGATE ST32000SSSUN2.0T"
	 physicalFirmware:  	 061A
	 physicalInsertTime:	 2010-12-21T01:04:07-06:00
	 physicalInterface: 	 sas
	 physicalSerial:    	 L2HTLY
	 physicalSize:      	 1862.6559999994934G
	 slotNumber:        	 3
	 status:            	 normal

CellCLI>  list griddisk where celldisk=CD_03_exaucel02 attributes name,size,offset
	 DATA_CD_03_exaucel02	 1562G     	 32M
	 RECO_CD_03_exaucel02	 299.65625G	 1562.046875G

Using the above names and sizes, I then dropped and re-created the celldisk and griddisks, and then added the disk back into their respective diskgroups.

CellCLI> drop   celldisk CD_03_exaucel02 force

CellCLI> create celldisk CD_03_exaucel02 lun=0_3

CellCLI> create griddisk DATA_CD_03_exaucel02 celldisk=CD_03_exaucel02,size=1562G

CellCLI> create griddisk RECO_CD_03_exaucel02 celldisk=CD_03_exaucel02,size=299.65625G

CellCLI> list griddisk where celldisk=CD_03_exaucel02 attributes name,size,offset

SQL> alter diskgroup DATA add disk 'o/192.168.10.6/DATA_CD_03_exaucel02' ;
SQL> alter diskgroup RECO add disk 'o/192.168.10.6/RECO_CD_03_exaucel02' ;

The reblance operation was now running and could be seen in the gv$asm_operation view in the ASM instances. When completed, the disks were back in ASM.

From V$ASM_DISK
SQL:ASM> @asm_info

GROUP_NUMBER FAILGROUP                      PATH                                     MOUNT_STATUS STATE
------------ ------------------------------ ---------------------------------------- ------------ --------
           1 EXAUCEL02                      o/192.168.10.6/DATA_CD_03_exaucel02      CACHED       NORMAL
           2 EXAUCEL02                      o/192.168.10.6/RECO_CD_03_exaucel02      CACHED       NORMAL

The same steps were followed to add the other disk back into ASM.

Root Filesystem Full – No Space Left on Device due to open files

Here’s an interesting scenario that I was asked to look into recently with the root file system on an Oracle database server filling up. Normally cleaning up disk space is straight forward; find the large and/or old files and delete them. However, in this case there was a difference is space usage reported between df and du, and the find utility could not locate any file over 1G in size.

Here’s the status of the root file system which was causing the “No Space Left on Device” error message.

# df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VGExaDb-LVDbSys1
                       30G   30G     0 100% /

After deleting around 2G of old files and logs, the error went away but the output of df -h showed the root file system slowing filling up again. These directory sizes hardly changed at all, only MB differences. From the “/” directory, here are all the directories that are on the “/” file system as seen in df -h $dir .

# du -sh *
7.7M     bin
67M     boot
3.5M     dev
8.5M     etc
1.2G     home
440M     lib
28M     lib64
16K     lost+found
4.0K     media
1.2G     mnt
6.8G     opt
1.1G     root
41M     sbin
4.0K     selinux
4.0K     srv
0     sys
12M     tmp
3.0G     usr
260M     var

Notice here that the sum of these directories only adds up to around 15G, leaving the rest of the used space unaccounted for, and the file system used space was still increasing.

Next was to look at open files. It is worth mentioning here that even if a file is deleted, it’s space may not be reclaimed if the process that created it, or still using it, is still running. Using the lsof ( list open files ) utility will show these files.

# lsof | grep deleted
...
expdp      7271  oracle    1w      REG              253,0 16784060416    2475867 /home/oracle/nohup.out (deleted)
expdp      7271  oracle    2w      REG              253,0 16784060416    2475867 /home/oracle/nohup.out (deleted)
…
#
# ps -ef | grep 7271
oracle    7271     1 99 May31 ?        3-10:43:36 expdp               directory=DP_DIR dumpfile=exp_schema.dmp logfile=exp_schema.log schemas=schema

The above shows an export data pump job ( pid = 7271 ) whose process was still running at the OS level, although it was not running in the database. This job was probably canceled out for some reason, but was not cleaned up although the nohup file was deleted. The background process was still running at the OS level and the nohup.out file is taking up the space filling up the “/” partition. It is worth mentioning here that the use of nohup is NOT desired with data pump. The data pump utilities are server side processes; if you kick off a job and then loose your terminal for whatever reason, the data pump job is still running.

Once the expdp process 7271 was killed at the OS level, the space was reclaimed.

# df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VGExaDb-LVDbSys1
                       30G   13G   16G  45% /