It's All About ORACLE

Rapidly Changing Dimension (RCD) in Data Warehouse

Posted by HIMANSHU KARKI at 7:51 PM comments (2)

A dimension is a fast changing or rapidly changing dimension if one or more of its attributes in the table changes very fast and in many rows. Handling rapidly changing dimension in data warehouse is very difficult because of many performance implications.

As you know slowly changing dimension type 2 is used to preserve the history for the changes. But the problem with type 2 is, with each and every change in the dimension attribute, it adds new row to the table. If in case there are dimensions that are changing a lot, table become larger and may cause serious performance issues. Hence, use of the type 2 may not be the wise decision to implement the rapidly changing dimensions.

For example: Consider patient dimension where there are 1000 rows in it. On average basis, each patient changes the 10 of attributes in a year. If you use the type 2 to manage this scenario, there will be 1000*10 = 10000 rows. Imagine if the table has 1 million rows, it’ll become very hard to handle the situation with type 2. Hence we use rapidly changing dimension approach.

Separate Rapidly Changing Attribute by Implementing Junk Dimension

Consider the fact table, in which not all the attributes of the table changes rapidly. There may be some attribute that may be changing rapidly and other not. The idea here is to separate the rapidly changing attribute from the slowly changing ones and move those attribute to another table called junk dimension and maintain the slowly changing attribute in same table. In this way, we can handle situation of increasing table size.

PATIENT

Patient_id

Name

Gender

Marital_status

Weight

BMI

The attribute like patient_id, Name, Gender, Marital_status will not change or changes very rarely. And attribute like weight and BMI (body mass index) changes every month based on the patient visit to hospital. So, we need to separate the weight column out of the patient table otherwise we end up filling the table if we use SCD type 2 on PATIENT dimension. We can put the weight column which is rapidly changing into junk dimension table. Below is the structure of Junk dimension table.

PATIENT_JNK_DIM

Pat_SK

Weight

BMI

Pat_SK is the surrogate key and acts as a primary key for junk dimension table. Below is how our PATIENT table looks after removing weight and BMI from it.

PATIENTPatient_idNameGenderMarital_status

Link Junk Dimension and PATIENT Table

In this step, we must link the junk dimension and patient table. Keep in mind; we cannot simply refer the junk dimension table by adding its primary key to patient table as foreign key. Because any changes made to junk dimension will have to reflect in the patient table, this obviously increases the data in patient dimension. Instead, we create one more table called mini dimension that acts as a bridge between Patient and Junk dimension. We can also add the columns such as start and end date to track the change history. Below is the structure of the mini dimension:
PATIENT_MINI_DIMPat_idPat_SKSTART_DATEEND_DATE

This table is just bridge between two tables and does not require any surrogate key in it. Below is the diagrammatic representation of the Rapidly Changing Dimension implementation.

Oracle Exchange Partitioning Explored

Posted by HIMANSHU KARKI at 10:34 PM comments (6)

You can convert a partition or subpartition into a nonpartitioned table, and a nonpartitioned table into a partition or subpartition of a partitioned table by exchanging their data segments. You can also convert a hash partitioned table into a partition of a composite *-hash partitioned table, or convert the partition of a composite *-hash partitioned table into a hash partitioned table. Similarly, you can convert a range- or list-partitioned table into a partition of a composite *-range or -list partitioned table, or convert a partition of the composite *-range or -list partitioned table into a range- or list-partitioned table.

Exchanging table partitions is useful to get data quickly in or out of a partitioned table. For example, in data warehousing environments, exchanging partitions facilitates high-speed data loading of new, incremental data into an existing partitioned table.

OLTP and data warehousing environments benefit from exchanging old data partitions out of a partitioned table. The data is purged from the partitioned table without actually being deleted and can be archived separately afterward.

When you exchange partitions, logging attributes are preserved. You can optionally specify if local indexes are also to be exchanged with the INCLUDING INDEXES clause, and if rows are to be validated for proper mapping with the WITH VALIDATION clause.

When you specify WITHOUT VALIDATION for the exchange partition operation, this is normally a fast operation because it involves only data dictionary updates. However, if the table or partitioned table involved in the exchange operation has a primary key or unique constraint enabled, then the exchange operation is performed as if WITH VALIDATION were specified to maintain the integrity of the constraints.

You must use the UPDATE GLOBAL INDEXES or UPDATE INDEXES on the exchange partition of the parent table in order for the primary key index to remain usable.

To avoid the overhead of this validation activity, issue the following statement for each constraint before performing the exchange partition operation:

ALTER TABLE table_name
DISABLE CONSTRAINT constraint_name KEEP INDEX

Enable the constraints after the exchange.

Partitioning an Existing Table using EXCHANGE PARTITION

This article presents a simple method for partitioning an existing table using the EXCHANGE PARTITION syntax. We have a a non-partitioned table BIG_TABLE and we will be converting it into Range Partitioned table BIG_TABLE2.

Create a Sample Schema
First we create a sample schema as our starting point.

-- Create and populate a small lookup table.
CREATE TABLE lookup (
id NUMBER(10),
description VARCHAR2(50)
);

ALTER TABLE lookup ADD (
CONSTRAINT lookup_pk PRIMARY KEY (id)
);

INSERT INTO lookup (id, description) VALUES (1, 'ONE');
INSERT INTO lookup (id, description) VALUES (2, 'TWO');
INSERT INTO lookup (id, description) VALUES (3, 'THREE');
COMMIT;

-- Create and populate a larger table that we will later partition.
CREATE TABLE big_table (
id NUMBER(10),
created_date DATE,
lookup_id NUMBER(10),
data VARCHAR2(50)
);

DECLARE
l_lookup_id lookup.id%TYPE;
l_create_date DATE;
BEGIN
FOR i IN 1 .. 1000000 LOOP
IF MOD(i, 3) = 0 THEN
l_create_date := ADD_MONTHS(SYSDATE, -24);
l_lookup_id := 2;
ELSIF MOD(i, 2) = 0 THEN
l_create_date := ADD_MONTHS(SYSDATE, -12);
l_lookup_id := 1;
ELSE
l_create_date := SYSDATE;
l_lookup_id := 3;
END IF;

INSERT INTO big_table (id, created_date, lookup_id, data)
VALUES (i, l_create_date, l_lookup_id, 'This is some data for ' || i);
END LOOP;
COMMIT;
END;
/

-- Apply some constraints to the table.
ALTER TABLE big_table ADD (
CONSTRAINT big_table_pk PRIMARY KEY (id)
);

CREATE INDEX bita_created_date_i ON big_table(created_date);

CREATE INDEX bita_look_fk_i ON big_table(lookup_id);

ALTER TABLE big_table ADD (
CONSTRAINT bita_look_fk
FOREIGN KEY (lookup_id)
REFERENCES lookup(id)
);

-- Gather statistics on the schema objects
EXEC DBMS_STATS.gather_table_stats(USER, 'LOOKUP', cascade => TRUE);

EXEC DBMS_STATS.gather_table_stats(USER, 'BIG_TABLE', cascade => TRUE);

Create a Partitioned Destination Table
Next we create a new table with the appropriate partition structure to act as the destination table. The destination must have the same constraints and indexes defined.

-- Create partitioned table.
CREATE TABLE big_table2 (
id NUMBER(10),
created_date DATE,
lookup_id NUMBER(10),
data VARCHAR2(50)
)
PARTITION BY RANGE (created_date)
(PARTITION big_table_2007 VALUES LESS THAN (MAXVALUE));

-- Add new keys, FKs and triggers.
ALTER TABLE big_table2 ADD (
CONSTRAINT big_table_pk2 PRIMARY KEY (id)
);

CREATE INDEX bita_created_date_i2 ON big_table2(created_date) LOCAL;
CREATE INDEX bita_look_fk_i2 ON big_table2(lookup_id) LOCAL;

ALTER TABLE big_table2 ADD (
CONSTRAINT bita_look_fk2
FOREIGN KEY (lookup_id)
REFERENCES lookup(id)
);

With this destination table in place we can start the conversion.

EXCHANGE PARTITION
We now switch the segments associated with the source table and the partition in the destination table using the EXCHANGE PARTITION syntax.

ALTER TABLE big_table2
EXCHANGE PARTITION big_table_2007
WITH TABLE big_table
WITHOUT VALIDATION
UPDATE GLOBAL INDEXES;

The exchange operation should not be affected by the size of the segments involved.

Once this is complete we can drop the old table and rename the new table and all it's constraints.

DROP TABLE big_table;

RENAME big_table2 TO big_table;

ALTER TABLE big_table RENAME CONSTRAINT big_table_pk2 TO big_table_pk;
ALTER TABLE big_table RENAME CONSTRAINT bita_look_fk2 TO bita_look_fk;
ALTER INDEX big_table_pk2 RENAME TO big_table_pk;
ALTER INDEX bita_look_fk_i2 RENAME TO bita_look_fk_i;
ALTER INDEX bita_created_date_i2 RENAME TO bita_created_date_i;

SPLIT PARTITION
Next, we split the single large partition into smaller partitions as required.

ALTER TABLE big_table
SPLIT PARTITION big_table_2007 AT (TO_DATE('31-DEC-2005 23:59:59', 'DD-MON-YYYY HH24:MI:SS'))
INTO (PARTITION big_table_2005,
PARTITION big_table_2007)
UPDATE GLOBAL INDEXES;

ALTER TABLE big_table
SPLIT PARTITION big_table_2007 AT (TO_DATE('31-DEC-2006 23:59:59', 'DD-MON-YYYY HH24:MI:SS'))
INTO (PARTITION big_table_2006,
PARTITION big_table_2007)
UPDATE GLOBAL INDEXES;

EXEC DBMS_STATS.gather_table_stats(USER, 'BIG_TABLE', cascade => TRUE);

The following queries show that the partitioning was successful.
SELECT partitioned
FROM user_tables
WHERE table_name = 'BIG_TABLE';

PAR
---
YES

1 row selected.

SELECT partition_name, num_rows
FROM user_tab_partitions
WHERE table_name = 'BIG_TABLE';

PARTITION_NAME NUM_ROWS
------------------------------ ----------
BIG_TABLE_2005 335326
BIG_TABLE_2006 332730
BIG_TABLE_2007 334340

3 rows selected.

Exchanging a Range-Partitioned Table with a *-Range Partition

The semantics of the ALTER TABLE EXCHANGE PARTITION statement are the same as described previously in "Exchanging a Hash Partitioned Table with a *-Hash Partition". The example below shows the orders table, which is interval partitioned by order_date, and subpartitioned by range on order_total. The example shows how to exchange a single monthly interval with a range-partitioned table.

CREATE TABLE orders_mar_2007
( id NUMBER
, cust_id NUMBER
, order_date DATE
, order_total NUMBER
)
PARTITION BY RANGE (order_total)
( PARTITION p_small VALUES LESS THAN (1000)
, PARTITION p_medium VALUES LESS THAN (10000)
, PARTITION p_large VALUES LESS THAN (100000)
, PARTITION p_extraordinary VALUES LESS THAN (MAXVALUE)
);

Populate the table with orders for March 2007. Then create an interval-range partitioned table:

CREATE TABLE orders
( id NUMBER
, cust_id NUMBER
, order_date DATE
, order_total NUMBER
)
PARTITION BY RANGE (order_date) INTERVAL (NUMTOYMINTERVAL(1,'MONTH'))
SUBPARTITION BY RANGE (order_total)
SUBPARTITION TEMPLATE
( SUBPARTITION p_small VALUES LESS THAN (1000)
, SUBPARTITION p_medium VALUES LESS THAN (10000)
, SUBPARTITION p_large VALUES LESS THAN (100000)
, SUBPARTITION p_extraordinary VALUES LESS THAN (MAXVALUE)
)
(PARTITION p_before_2007 VALUES LESS THAN (TO_DATE('01-JAN-2007','dd-
MON-yyyy')));

It is important that the partitioning key in the orders_mar_2007 table matches the subpartitioning key in the orders table.

Next, exchange the partition. Because an interval partition is to be exchanged, the partition is first locked to ensure that the partition is created.

LOCK TABLE orders PARTITION FOR (TO_DATE('01-MAR-2007','dd-MON-yyyy'))
IN SHARE MODE;

ALTER TABLE orders
EXCHANGE PARTITION
FOR (TO_DATE('01-MAR-2007','dd-MON-yyyy'))
WITH TABLE orders_mar_2007

WITH VALIDATION;

Partitioning a table online with DBMS_REDEFINITION

Posted by HIMANSHU KARKI at 11:13 AM comments (2)

If there is a requirement to change the structure of a table that is already in use productively, it may be impossible to get a maintenance downtime for that table, because it is constantly in use. That can be the case for all kind of structural changes of a table, particularly for the change from an ordinary heap table into a partitioned table, which we are going to take here as an example. In order to demonstrate that, we will create a demonstration user with a non-partitioned table with privileges and additional dependent objects on it:

SQL> grant dba to adam identified by adam;

Grant succeeded.

SQL> connect adam/adam

Connected.

SQL> create table original as select

rownum as id,

mod(rownum,5) as channel_id,

5000 as amount_sold,

mod (rownum,1000) as cust_id,

sysdate as time_id

from dual connect by level<=1e6;

Table created.

SQL> create index original_id_idx on original(id) nologging;

Index created.

SQL> grant select on original to hr;

Grant succeeded.

The challenge is now to change this table into a partitioned one while it is used with DML & queries by end users. For this purpose, we introduced already in 9i (if I recall it right) the package DBMS_REDEFINITION. First step would be to ask, whether it can be used in this case:

SQL> select * from v$version;

BANNER

--------------------------------------------------------------------------------

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production

PL/SQL Release 11.2.0.1.0 - Production

CORE 11.2.0.1.0 Production

TNS for Linux: Version 11.2.0.1.0 - Production

NLSRTL Version 11.2.0.1.0 - Production

SQL> begin

dbms_redefinition.can_redef_table

(uname=>'ADAM',

tname=>'ORIGINAL',

options_flag=>DBMS_REDEFINITION.CONS_USE_ROWID);

end;

PL/SQL procedure successfully completed.

Because there is no Primary Key on the original table, I have to use CONS_USE_ROWID, else I could use CONS_USE_PK. There are no objections against the online redefinition of the table here – else an error message would appear. Next step is to create an interim table of the structure, desired for the original table. In my case, I create it interval partitioned (an 11g New Feature). I could also change storage attributes and add or remove columns during that process.

SQL> create table interim

(id number,

channel_id number(1),

amount_sold number(4),

cust_id number(4),

time_id date)

partition by range (cust_id)

interval (10)

(partition p1 values less than (10));

Table created.

My original table has 1000 distinct cust_ids, so this will lead to 100 partitions – each partion will contain 10 distinct cust_ids. One benefit of that would be the possibility of partition pruning, should there be statements, specifying the cust_id in the where-condition. These statements will be about 100 times faster as a full table scan. The next step will basically insert all the rows from the original table into the interim table (thereby automatically generating 99 partitions), while DML during that period is recorded:

SQL> set timing on

SQL>

BEGIN

DBMS_REDEFINITION.START_REDEF_TABLE

(uname=>'ADAM',

orig_table=>'ORIGINAL',

int_table=>'INTERIM',

options_flag=>DBMS_REDEFINITION.CONS_USE_ROWID);

end;

PL/SQL procedure successfully completed.

Elapsed: 00:00:22.76

If this step takes a long time to run it might be beneficial to use the SYNC_INTERIM_TABLE procedure occasionally from another session. That prevents a longer locking time for the last step, the calling of FINISH_REDEF_TABLE. Next step is now to add the dependent objects/privileges to the interim table:

SQL> set timing off

SQL> vari num_errors number

BEGIN

DBMS_REDEFINITION.COPY_TABLE_DEPENDENTS

(uname=>'ADAM',

orig_table=>'ORIGINAL',

int_table=>'INTERIM',

num_errors=>:num_errors);

END;

PL/SQL procedure successfully completed.

SQL> print num_errors

NUM_ERRORS

----------

There was no problem with this step. Until now the original table is still an ordinary heap table – only the interim table is partitioned:

SQL> select table_name from user_part_tables;

TABLE_NAME

------------------------------

INTERIM

In the last step, the two tables change their names and the recorded DML that occured in the meantime gets used for actualization:

SQL> begin

dbms_redefinition.finish_redef_table

(uname=>'ADAM',

orig_table=>'ORIGINAL',

int_table=>'INTERIM');

end;

PL/SQL procedure successfully completed.

We will now determine that the original table is partitioned and the dependencies are still there:

SQL> select table_name,partitioning_type from user_part_tables;

TABLE_NAME PARTITION

------------------------------ ---------

ORIGINAL RANGE

SQL> select count(*) from user_tab_partitions;

COUNT(*)

----------

100

SQL> select grantee,privilege from user_tab_privs_made where table_name='ORIGINAL';

GRANTEE PRIVILEGE

------------------------------ ----------------------------------------

HR SELECT

SQL> select index_name,table_name from user_indexes;

INDEX_NAME TABLE_NAME

------------------------------ ------------------------------

ORIGINAL_ID_IDX ORIGINAL

TMP$$_ORIGINAL_ID_IDX0 INTERIM

The interim table can now be dropped. We changed the table into a partitioned table without any end user noticing it!

This picture illustrates the steps you have seen above:

Conclusion: If a table structure needs to be modified and the table is permanently accessed by end users, this can be done with some effort using DBMS_REFDEFINITION. One common – but not the only possible – use case is the modification of a non-partitioned table into a partitioned one. You have seen a simplified demonstration about it. As always: Don’t bel

Partitioning Decisions & Benefits

Posted by HIMANSHU KARKI at 6:45 AM comments (2)

Understand your partition choices, and choose the best partition and key for your data

The partitioning feature of Oracle Database enables you to partition stored data segments such as tables and indexes for easier management and improved performance. Oracle Database provides many types of partitioning options, including range, list, hash, range/list, and range/hash. The real challenge for database designers is not creating the partitioned objects; rather, it's the decision that precedes it—what type of partitioning to use and what column to use as a partition key. This article describes how to choose the best partitioning strategy, including the best partition type and best partition key column.

A Quick Primer

Partitioning, in a nutshell, stores a data segment such as a table as multiple segments while retaining a logically monolithic structure. The most popular partitioning option is range partitioning, with which you define a range of values for each partition. For example, in a table of customer transactions called TRANS, you can range-partition it by using the TRANS_DT (transaction date) column as the partition key so that a first partition holds records in which the TRANS_DT value is between January 1 and March 31, 2005; the second partition holds records in which the TRANS_DT value is between April 1 and June 30; and so on. Listing 1 creates this table.

Code Listing 1: Script for creating the TRANS table with range partitions


create table trans (

trans_id        number,
trans_dt        date,

store_id        number,
product_code    number,

partition by range (trans_dt)
trans_amount    number(12,2)
)
(

partition y05q1 values less than (to_date('04/01/2005','mm/dd/yyyy')) tablespace y05q1,

partition y05q2 values less than (to_date('07/01/2005','mm/dd/yyyy')) tablespace y05q2,

partition y05q3 values less than (to_date('10/01/2005','mm/dd/yyyy')) tablespace y05q3,

partition y05q4 values less than (to_date('01/01/2006','mm/dd/yyyy')) tablespace y05q4,

partition pmax values less than (maxvalue) tablespace users

)

After you've created the TRANS table, you will see—if you query the DBA_TABLES dictionary view—that the PARTITIONED column value is YES. You can get further details about the partitioning, such as the type of partitioning scheme and the number of partitions, from the DBA_PART_TABLES view. You can get the details of each partition, such as the name and the upper boundary of the partition, from the DBA_TAB_PARTITIONS view, as shown in Listing 2. Note that this view shows the high value of a partition, which is actually the lower boundary of the next partition. A partition holds records up to but not including its high value. For instance, the upper boundary of the Y05Q2 partition is "2005-07-01 00:00:00", meaning this partition contains values up to "2005-06-30 23:59:59". Records with TRANS_DT values of 2005-07-01 00:00:00 and later will be stored in the next partition-—Y05Q3.

Code Listing 2: TRANS partition details


col partition_position format 999 head "Pos"
col partition_name format a10 head "Name"

col high_value format a50 head "High Value"


select partition_position, partition_name, high_value

from dba_tab_partitions

where table_name = 'TRANS' order by 1;
Pos  Name    High Value

---  -----   -----------------------------------------------------------------------

1    Y05Q1   TO_DATE(' 2005-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GRE...

2    Y05Q2   TO_DATE(' 2005-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GRE...

3    Y05Q3   TO_DATE(' 2005-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GRE...

4    Y05Q4   TO_DATE(' 2006-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GRE...

5    PMAX    MAXVALUE

In the TRANS table example, you could find a column that can be grouped into ranges of values such as dates, but it may not be possible to group all columns that way. For example, a column holding names of U.S. states contains a finite and small number of values. This type of column calls for listpartitioning, in which the partitions hold discrete values instead of ranges. Here is an example:


create table sales (

product_id      number,
trans_amt       number,

state_code      varchar2(2)
sales_dt        date,
)

partition ct    values ('CT'),
partition by list (state_code)
(

partition def   values (default)
partition ca    values ('CA'),

);

Sometimes it may not be possible to define the ranges of—or a set of known values for—each partition. A typical example is a table, CUST, containing customers, with the CUST_ID column as the primary key. The CUST_ID value is an incrementally increasing but rather meaningless number, so a range partition based on these values may also be rather meaningless.

You can partition the CUST table by using hash , where a hash function is applied to the partition key of each row and, based on the output, the row is placed in an appropriate partition. All the hash partitions hold an equal number of rows. Here is how you create the CUST table with four hash partitions:


create table cust (

cust_id             number,

cust_name       varchar2(20)
)
partition by hash (cust_id)

partitions 4;

Indexes

You can create two types of indexes on partitioned tables:

Local. The index is partitioned in exactly the same way as the base table. For instance, you can create a local index on the TRANS_AMOUNT column of the TRANS table as follows:


create index in_trans_01
on trans (trans_amount)

local;

This creates a range-partitioned index, on the TRANS_DT column, the same way the TRANS table is partitioned. All the index entries for a specific partition, such as Y05Q1, will exist only inside the corresponding partition of the index.

Global. The index can span all partitions in the base table. For example, suppose there is a primary key on TRANS, on the TRANS_ID column. The primary key can be anywhere inside the table, across all the partitions. In this case, the index entries of a partition of the table may exist outside the corresponding partition of the index. For such a situation, create a global index, as follows:


alter table trans 

add constraint pk_trans
primary key (trans_id)

using index global;

The Partition Decision

Now that you've seen the basics of partitioning, here comes the tough part: How do you decide on the type and columns used for partitioning? Your choice of partitioning has to address key partitioning objectives for manageability and performance. How you partition will be based on the priorities you assign to your objectives. I break down the objectives into the following categories:

Performance
Ease of administration
Data purge
Data archiving
Data movement
Data lifecycle management
Efficiency of backup

Performance. This, of course, is the primary objective of many partitioning schemes. Performance advantages come from partition pruning or partitionwise joins, so if your queries do a lot of full-table scans, partitioning will help immensely, because partitions will limit the scope of the search. Consider a scenario involving the TRANS table. Suppose everyone is interested in the aggregate sales figures but product managers (PMs) are interested in the sales figures of their respective products only, not in all the sales. So the PM of product code "8" issues this SQL statement several times an hour:


select trans_dt, sum(trans_amount), avg (trans_amount), 
max(trans_amount), min(trans_amount)
from trans

group by trans_dt;

where product_code = 8

You can improve the performance of this query by building an index on the PRODUCT_CODE column, but it will increase the execution time on INSERT statements and might also adversely influence the execution plans of other queries on the same table. Therefore, it might be judicious not to create the index but to list-partition the table on the PRODUCT_CODE column, with one partition per PRODUCT_CODE value. The PM's query will still do a full-table scan, but this time the scope of the "full table" is actually the partition , not the entire table. This enhances the performance immensely. Recall that list partitioning was preferred over range partitioning in this case because the PRODUCT_CODE is a discrete value.

If two rather large tables are often joined together in SQL statements, consider the potential benefits of partitionwise joins. Suppose you have another table called ADVERTISING, which contains detailed information on the company's advertising. Here is a description of the ADVERTISING table:


Name         Null?         Type

---------    ---------     -------------
AD_ID                      NUMBER

PRODUCT_CODE               NUMBER
AD_DT                      DATE
AD_COST                    NUMBER

CHANNEL                    VARCHAR2(10)

Suppose that PMs often query the ADVERTISING and TRANS tables to see the revenue vis-à-vis various advertising amounts spent to check the effectiveness of the advertising campaigns. Here is the query they use to find out the revenue numbers and advertising expenditures for a specific period for each product:


select t.product_code, sum(trans_amount), sum(ad_cost)
from trans t, advertising a
where t.trans_dt = a.ad_dt

and trans_dt between '1-aug-05' and 
and t.product_code = a.product_code
'1-sep-05'

group by t.product_code;

Here the tables are joined on the columns TRANS_DT and AD_DT, both date columns. To enhance performance, you might decide to partition both tables in the same way. In this case, range partitioning on the columns being joined and partitioning on the same ranges is the most effective. Listing 3 shows the script for creating the ADVERTISING table with this range partition.

Code Listing 3: Script for creating the ADVERTISING table


create table advertising (
ad_id   number,

product_code    number,
ad_dt   date,
        ad_cost number,

partition by range (ad_dt)
channel varchar2(10)
)
(

partition y05q1 values less than (to_date('04/01/2005','mm/dd/yyyy')) tablespace y05q1,

partition y05q2 values less than (to_date('07/01/2005','mm/dd/yyyy')) tablespace y05q2,

partition y05q3 values less than (to_date('10/01/2005','mm/dd/yyyy')) tablespace y05q3,

partition y05q4 values less than (to_date('01/01/2006','mm/dd/yyyy')) tablespace y05q4,

partition pmax values less than (maxvalue) tablespace y05q4

);

Similarly, if you often join these tables on the PRODUCT_CODE column, then that column should be a strong candidate for the partition key. And because product code values are in a specific set, the partitioning scheme should be list partitioning.

If the ADVERTISING and TRANS tables are not joined on some specific date or date fields but rather on some sequential number such as the AD_ID or TRANS_ID value, you should reevaluate the partitioning strategy. Because records have unique values in those columns, you can't choose a list-partitioning strategy (which requires discrete, finite values). A range partition would be ineffective, because the sequentially increasing values in these two columns would require numerous partitions to be created that might not have any specific meaning.

In such a case, the hash partition is useful. It makes sure the records are evenly spread over all the partitions. You can create the tables as follows:


create table trans (

trans_id        number,
trans_dt        date,

store_id        number,
product_code    number,

partition by hash (trans_id)
trans_amount    number(12,2)
)
(

partition trans2 tablespace trans2,
partition trans1 tablespace trans1,

partition trans4 tablespace trans4
partition trans3 tablespace trans3,
)
/
create table advertising (      

ad_cost number,
ad_id   number,
        ad_dt   date,
        product_code    number,

partition ad2 tablespace ad2,
channel varchar2(10)
)
partition by hash (ad_id)
(
  partition ad1 tablespace ad1,

/
partition ad3 tablespace ad3,
  partition ad4 tablespace ad4

)

After the tables are created, you can use the DBMS_XPLAN tool to check to make sure the partitions are used properly. First you have to generate a plan for the PM's product query:


explain plan for

select t.product_code, sum(trans_amount), sum(ad_cost)
from trans t, advertising a
where t.trans_id = a.ad_id

and t.product_code = a.product_code
and t.trans_id = 101

group by t.product_code;

Then you can see what the plan looks like:

select * from table(dbms_xplan.display);

The output is shown in Listing 4. Note the PSTART and PSTOP columns, which indicate which table partitions were selected. For the line with Id=4, the values of these columns are 4 and 4, which indicates that the fourth partition of the TRANS table was selected, where the record with TRANS_ID=101 resides. Note the line with Id=6, which shows that the optimizer will search only partition 4 of the joined ADVERTISING table as well, not the entire table. In the absence of an index, this hash-partition strategy makes the PM's query much faster than a full-table scan on the entire table, for both the TRANS and ADVERTISING tables.

Ease of administration. Partitioning is always introduced as a technique for managing large objects. Although it's applicable to objects of any size, the advantages are more obvious in large tables. When you rebuild an index on a nonpartitioned table, your only option is to build the entire index in one statement. If the table is partitioned, however, you can rebuild partitions of local indexes one at a time. For instance, in the case of the TRANS table, you can rebuild the Y05Q1 partition of the IN_TRANS_01 local index:

alter index in_trans_01 rebuild partition y05q1;

as opposed to rebuilding the IN_TRANS_01 index as a whole. The advantage of this approach is even more pronounced when you load a specific partition and the index needs to be rebuilt only on that partition, not on the rest of the index. In addition to doing index rebuilds, you can also move tables across tablespaces, export tables, delete data, and so on—one partition at a time.

Data purge. Suppose that TRANS is a table in an OLTP system and you want to remove old records, such as the ones with TRANS_DT values in the first quarter of 2005. The conventional method is to delete the records, but deletion causes undo and redo information to be generated, which impacts database performance. This can also be a very time-consuming operation, depending on the volume of data being purged. An alternative to deleting records is to drop a partition. In the case of the TRANS table, if you decided to drop all the records from the first quarter of 2005, you'd issue

alter table trans drop partition y05q1;

That's it. After you execute this command, the partition disappears from the table. This statement does not actually delete anything from the table; it merely updates the data dictionary to indicate that the Y05Q1 partition no longer belongs to the table. Because no data moves (except a small amount of information in the data dictionary), undo and redo generation is minimal, making the operation extremely fast compared to the deletion approach and virtually impact-free in terms of performance.

Note that a regular DELETE statement deletes table rows from wherever they exist, leaving empty spaces in the blocks. Although the table data may be sparse, the overall size of the segment may still be large and have a very high high-water mark (HWM, the largest size the table has ever occupied). A high HWM slows full-table scans, because Oracle Database has to search up to the HWM, even if there are no records to be found. When a partition is dropped, the segment itself is gone and the HWM of other segments remains the same, so the query runs faster.

While the DELETE statement deletes the rows, the corresponding index entries are also adjusted, causing fragmentation, which may impact performance.

When a partition is dropped, the corresponding partition of any local index is also dropped. If there is a global index, however, that index can become unusable when you drop the partition. To prevent the index from becoming unusable, in Oracle9i Database and later, you can update the global index when you drop the partition. The following drops the Y05Q1 partition and updates the global index:

alter table trans drop partition y05q1 update global indexes;

If you decide not to do an automatic index update as part of dropping the partition, that global index will become unusable. You will have to rebuild the index to make it usable, but you can defer that task to later, after you've dropped other partitions or done some other operations.

You can manually rebuild the global index on TRANS, by using

alter index pk_trans rebuild;

Data archiving. If you decide to purge data without retaining it, you can use the technique described in the previous section. Data to purge without retention includes log or debug information, which can simply be dropped. Most information, however, probably needs to be archived for future access. In the TRANS table example, if you wanted to store the contents of the Y05Q1 partition before purging it, you could use the following partition exchange technique:

1. Create a table that is almost identical in structure to the TRANS table, except that it is not partitioned:


create table trans_y05q1 as
select * from trans

where 1=2;

2. This creates an empty unpartitioned table TRANS_Y05Q1, whose structure is identical to that of TRANS. The 1=2 clause returns false, so no rows of the TRANS table are transferred. Exchange the contents of the partition with this new table:


alter table trans

exchange partition y05q1

with table trans_y05q1;

This operation makes the data inside the Y05Q1 partition appear inside the TRANS_Y05Q1 table and empties the partition. The data does not physically move from the partition to the new table. This exchange partition statement merely updates the data dictionary to reset a pointer from the partition to the table and vice versa. Because there is no physical movement of data, this exchange does not generate redo and undo, making it faster and far less likely to impact performance than traditional data-movement approaches such as INSERT.

After the EXCHANGE PARTITION command is executed, the partition is empty and you can drop it, as shown earlier. After the table is created, you can transport it out of the database and archive it for future use.

Data movement. One of the most challenging aspects of data warehouse administration is the development of ETL (extract, transform, and load) processes that load data from OLTP systems into data warehouse databases. Other instances in which large amounts of data are moved include replicating data from production to QA databases, populating data marts from main data warehouses and vice versa, and populating summary tables from transaction tables. Partitions may prove to be extremely valuable in these processes. Take, for instance, the TRANS table and another table—TRANS_SUMMARY—which records the total transaction amount for a particular date. The following describes the TRANS_SUMMARY table:


SQL> desc trans_summary

Name         Null?         Type

---------    ---------     ---------------------
PRODUCT_CODE               NUMBER

TOT_AMOUNT                 NUMBER(12,2)

A traditional approach to populating this data warehouse table every day might be as follows:


insert into trans_summary

select product_code, sum(trans_amount)
from trans

group by product_code;

This approach works but is plagued by some problems:

1. The INSERT statement will generate a large amount of undo and redo, which will affect the performance of the database. You can reduce the undo and redo generation by introducing multiple commits in a loop while loading data, but the overall elapsed time will increase significantly. You can use the direct path INSERT (using the APPEND hint) with NOLOGGING, as follows:


insert /*+ APPEND NOLOGGING */
into trans_summary

select product_code, sum(trans_amount)
from trans

group by product_code;

Data being loaded into the TRANS_SUMMARY table will remain unavailable, however, until the data load is complete.

2. A daily process will have to wipe the TRANS_SUMMARY table clean before loading fresh data; otherwise, it will load duplicates. This table is unavailable for general use from the time it is truncated until it is finally loaded. On some very large systems—depending on the complexity of the query, the size of the table, and the general load on both the source and target databases—this process can take hours, during which the users cannot see even the old data (which has been deleted prior to loading). If the INSERT statement fails, due to lack of space or data errors, the users will have to wait until the new data is loaded, which again can be hours.

The use of partitions eliminates or largely mitigates these two issues. Because the summary table is grouped by PRODUCT_CODE, you can create the TRANS_SUMMARY table list partitioned on the PRODUCT_CODE column:


create table trans_summary (
product_code number,

tot_amount      number(12,2)
)
partition by list (product_code)
(

partition p2  values (2),
partition p1  values (1),
   partition p3  values (3),

partition p6  values (6),
partition p4  values (4),
   partition p5  values (5),
   partition p7  values (7),

)
partition p8  values (8),
   partition p9  values (9),

partition p10 values (10)

During the loading process, you can load the data from TRANS into TRANS_SUMMARY partition by partition instead of loading the whole table. Here is how you would load the data for PRODUCT_CODE value=1:

1. First create a temporary table whose structure is identical to that of the TRANS table except that it's not partitioned:


create table trans_summary_temp
nologging as
select

cast(1 as number) product_code,
cast(sum(trans_amount) as number(12,2)) tot_amt

group by 1;
from trans

where product_code = 1

Note that the table is created with the NOLOGGING option, which significantly reduces the undo and redo generation. This is a dynamically created table, and while it is being created, the original TRANS and TRANS_SUMMARY tables are fully accessible to users.

2. Once the table has been created, you can exchange the TRANS_SUMMARY table with the p1 partition:


alter table trans_summary
exchange partition p1

with table trans_summary_temp

including indexes;

This operation instantly makes the data in the TRANS_SUMMARY_TEMP table available as the data in the p1 partition of the TRANS_SUMMARY table. This "switch" merely updates the pointers in the data dictionary; no data is actually moved. So this process is extremely fast (usually a few seconds), and the table is locked for that period only. The table data is available to users at all other times in the process. So this approach has four key advantages over loading the table directly:

1. The performance impact is low.

2. The main table is always available (except for a few seconds).

3. In case of data-loading failure, the old data is available.

4. You can refresh data selectively.

Data lifecycle management. Most business data follows a predictable lifecycle: In the beginning, it is accessed moderately; then it is updated heavily; and finally access slows down to almost nothing. Regulatory requirements may mandate retention in the database of even this least accessed data. However, because the older data is accessed with decreasing frequency, you can use partitioning to develop a disk access strategy that lowers the total cost of ownership without sacrificing performance. Put partitions with the most accessed and updated data on the fastest disk storage, and put older, less accessed data on slower—and cheaper—disk storage.

For instance, in the TRANS table, the records of more-recent transactions, such as in the current partition, Y05Q4, are accessed heavily, followed by those of the partition immediately preceding it—Y05Q3—and so on. Because the older partitions are not accessed that frequently, it might be worthwhile to put them on a storage tier that is slower than the one on which the current (and most-accessed) partitions reside. One way to save money by using inexpensive storage is to create a new tablespace on the cheaper storage and move the partition to this new tablespace:

alter table trans move partition y05q1 tablespace y05q1_inexpensive;

During the partition move, the partition will be available for using SELECTs but not for UPDATEs.

Efficiency of backup. Consider the example at the beginning of this article—the TRANS table. In that example, each partition is located in a different tablespace named for the partition. If you can say with certainty that the records with a TRANS_DT value earlier than today (or earlier than some specific date) will not change, then you can also consider the corresponding partition to be read-only. In that case, you can convert the tablespace the partition resides in to read-only, as follows:

alter tablespace y05q1 read only;

When you make the tablespace read-only, the Oracle Recovery Manager (RMAN) backup can exclude it while making backups, because the tablespace will not change over time; one backup is enough. The more tablespaces you can make read-only, the shorter the duration of the RMAN job, which also reduces the load the RMAN job puts on the database. This benefit is most visible in databases containing historical data, especially data warehouses, in which the total amount of data to be backed up is usually quite high and you can make numerous tablespaces read-only.

Decisions, Decisions

To make an informed partitioning decision, first prioritize the objectives for your own partitioning setup. For example, using some of this article's alternatives, suppose you decide on the following order of priority:

1. Data archiving

2. Data purge

3. Efficiency of backup

4. Ease of administration

5. Performance

You have deliberately chosen performance after the other objectives; in a different partitioning situation, however, performance may be the top priority.

Data archiving and data purge. For the data archiving and data purge priorities, you have to choose a differentiating column. Is it time-based, as in the case of TRANS_DT in the TRANS example table? If so, which column differentiates the records to be archived and purged? The TRANS table example uses the TRANS_DT column as the partition key in a range-partitioned table, but suppose you have a similar table with a column named EXPIRY_DT, which indicates the date after which the record will definitely not be updated. In that case, to better meet the data archiving and data purge priorities, make EXPIRY_DT the partitioning key in the range-partitioning scheme.

Suppose the differentiating column contains discrete values, such as CAMPAIGN_CODE (in the ADVERTISING table), denoting advertising campaigns. After each campaign is completed, its records are archived and purged. To meet the data archiving and data purge priorities in this situation, use list partitioning with the CAMPAIGN_CODE column as the partitioning key. A range scheme would be useless here, because you would need to drop partitions of a specific CAMPAIGN_CODE, not the range that contains it.

Now suppose that most queries against the TRANS table do not choose TRANS_DT in the WHERE clause. Choosing this column as a partitioning column does not help performance. Is it a good decision to choose a range-partition scheme, with TRANS_DT as the partition key? The answer lies in your prioritizing. In this example, you list performance after data archiving and data purge, so this scheme offers the best solution. If you had prioritized performance over data archiving and data purge, you would have chosen a different scheme.

Efficiency of backup. If efficient backup is your primary objective, ask yourself this question: Which column makes a record read-only? Suppose the answer is the TRANS_DT column. A record with a TRANS_DT value that is more than a month old will not be updated and should be considered read-only. In such a case, choose range partitioning on TRANS_DT, with each partition residing on a different tablespace, as shown in Listing 1. When the records on a partition will never be updated again, make the corresponding tablespace read-only. The RMAN backups will skip this tablespace during the backup, making the backup process faster.

Ease of administration. If ease of administration is the primary objective, your partition decision must be based on how the data is modified. Suppose you do a lot of index rebuilds on tables, due to data loads. You should note the column that determines which data is to be loaded. Is it a date—as in the TRANS_DT column in the TRANS table? If so, a range-partitioned table with TRANS_DT as the partitioning key is most appropriate. Is this column a discrete value, as in PRODUCT_CODE? If so, list partitioning will be useful. If no such pattern exists, you can use hash partitioning on the table and rebuild the indexes of the generated partitions.

Performance. If performance is the highest priority, choose the column that is used mostly in WHERE conditions to filter rows and in joins with other tables. If it's a time-based column, then range partitioning is very useful, as in the case of TRANS_DT in the TRANS table. Even if it's not time-based, the presence of some logical range will help. For instance, suppose the PRODUCT_CODE column (a number) follows a pattern such as 1000 to 1999 for consumer products, 2000 to 2999 for industrial products, and so on. When users query the table, are they focused on only a specific type of product, such as the industrial category? If that is the case, range partitioning can be used on the PRODUCT_CODE column, with partitions in the ranges 1000-1999, 2000-2999, and so on.

Combining partitioning strategies. You can also combine several of your most important goals with Oracle Database's composite partitioning strategies. For example, you can create a range/list-partitioned ADVERTISING table, using range partitioning on the AD_DT column to satisfy the performance requirement and list partitioning on the PRODUCT_CODE column to satisfy the data purge requirement.

Conclusion

The most important decisions in partition design are choosing the partitioning scheme and the column(s) to partition on, and these decisions depend heavily on the priorities of potentially conflicting objectives. Once you are familiar with the advantages of each type of partitioning scheme, you will be able to choose the scheme and column(s) appropriate for your partitioning needs.

It's All About ORACLE

Oracle - The number one Database Management System. Hope this Blog will teach a lot about oracle.

Rapidly Changing Dimension (RCD) in Data Warehouse

Separate Rapidly Changing Attribute by Implementing Junk Dimension

Link Junk Dimension and PATIENT Table

Oracle Exchange Partitioning Explored

Partitioning a table online with DBMS_REDEFINITION

Partitioning Decisions & Benefits

A Quick Primer

Indexes

The Partition Decision

Decisions, Decisions

Conclusion

You Might Also Like

Pages

Recent Comments

Translate

CLOCK

Search This Blog

About this blog

Oracle the number one Business solution company also It is most powerful database management system.

In this Blog I will be covering topics on oracle database management system, Working and developing Oracle Application, its Database Administration and Database Development concepts, also bit on Unix OS.

About me

Labels

Blog Archive

My Other Blog