Tuesday, October 31, 2017

Updating table with data from db link


About two months ago, I was contacted by one my friends. He is quite new to Oracle and he was asking me if I can help him with a service request he was working on. Problem he was facing was following:

  • He needed to update rows in table app.app_num_service with data from table customers which was in remote database
  • Data in table customers was versioned by order of columns STATUS_IND, PRIMARY_IND DESC, SEQNO and he needed to use most recent version for given customer (CUST_CODE)
  • Data in customers table did not contain data about all customers, so some customer codes where non existent in app.app_num_service table

I wont place original statement here. All you need to know is that he was making a rookie mistake of directly joining on big remote table in his update. I'm not claiming it's always bad idea, but definitely here it was. Just imagine how many network round trips double check on customer table for each row would be needed (expecting that CUST_CODE in indexed on remote side in first place).

Generally when we are working with remote databases in ETL processes ... and in cases like this also, we are trying to accomplish few main goals:

  • Minimize amount of data transferred through database link
  • Minimize amount of "queries" through database link
Both these goals go hand in hand with goal of minimizing time we are transferring our data. Don't forget that even select through remote link opens a transaction on remote side and we want to make it as short as possible and constrain remote database with our query for as short time as possible.

Tool we usually use to accomplish this goal is CTAS or TRUNCATE & INSERT APPEND into existing staging table. This goes hand in hand with transferring to local side ONLY data we really need which in our case is like 25% of whole table. What I suggested was following two DDLs:

CREATE TABLE service.banner TABLESPACE service AS
SELECT /*+ DRIVING_SITE(remote_table) */ customercode,
       phonenumbFROM (SELECT ROW_NUMBER()
              OVER (PARTITION BY CUST_CODE
                    ORDER BY STATUS_IND,
                             PRIMARY_IND DESC,
                             SEQNO) rn,
             CUST_CODE customercode,
             CONCAT(PHONE_AREA, PHONE_NUMBER) phonenumber
      FROM customers@remote_db)
WHERE rn=1;

CREATE UNIQUE INDEX service.banner_ix1 ON service.banner(customercode)
TABLESPACE service;

What we did here is, that we transferred only most recent versions of rows we will need for update and then created index for fast access based on customercode.

Hint DRIVING_SITE is quite irrelevant here but I'm used to use it. What is does is that it tells Oracle which database will process the query (driving side) and which one will send data. This comes in play when we are joining tables through database link and we are considering table sizes on both sides.

BEGIN
    dbms_stats.gather_table_stats(
        ownname          => 'SERVICE',
        tabname          => 'BANNER',
        estimate_percent => dbms_stats.auto_sample_size,
        method_opt       => 'FOR ALL INDEXED COLUMNS SIZE AUTO',
        degree           => dbms_stats.auto_degree
    );
END;
/

Now we'll calculate database statistics for our new table. Estimate is most of the time sufficient and quite fast. We do not need to calculate index statistics 'cos we where on Oracle 11g R2 so stats where calculated with index build.

With all of that done, we could run our simple update:

UPDATE app.app_num_service app
SET app.PHONENUMBER = (SELECT PHONENUMBER
                           FROM service.banner
                           WHERE service.customercode = app.customercode)
WHERE EXISTS(SELECT NULLFROM service.banner
             WHERE app.customercode = service.customercode);

COMMIT;

From non-ending update we ended with one pretty fast.

In conclusion you want to grab ONLY what you need and save it locally as fast as possible. Then work with it (probably with some extra work on indexing and stats) as needed.