Thursday, December 17, 2015

One does not simply create table with 300 columns


There was quite an interesting post on Oracle OTN, which is worth of this blog post and meme.

Problem was defined as follows:
I have a table which has 300 columns and keeps records 60,00,000 at this time. Records are increasing day by day. When I run a simple query like:

select * from table where store_no = 17

this query takes 30 to 60 seconds for fetching the data against this query which is to much time, while I'll use this table with more filters. Can anyone help me that how i can increase data fetching performance, i have also used a primary key indexing.

If you check supplied DDL against given SQL, you'll likely come to two points
  1. There is no index on column store_no and so Oracle will have to do full table scan
  2. ... 300 columns!?



My suggestion was to reconsider the data model. 300 columns are pretty rough and columns like IT_CLERK_ID2. IT_CLERK_ID3, ... , IT_CLERK_ID5 tend to be suspicious.

Reason for my suggestion was that Oracle will have to split rows in this table in two pieces because they have more than 255 columns. Interestingly enough, the split will be done from the end of the row. So first piece of the row will have first 45 columns and second piece will contain 255 columns. What should come to your mind now is that order of columns in such table is very important and you should place columns you are going to select as close as possible to the start of the row.

Result of this split will be extra CPU usage when selecting rows from second piece. Secondary result might come in form of row chaining between database blocks which will introduce additional single block reads to walk your rows.

Now let's have a little test. I've created two tables where all columns are VARCHAR2 with value 'X' except first and last column which is NUMBER with value 1. First table has 250 columns and second has 300.

I've traced four following SQL statements:

SELECT SUM(c1) from table_250;
SELECT SUM(c250) from table_250;
SELECT SUM(c1) from table_300;
SELECT SUM(c300) from table_300;

Let's check important parts of trace files:



As you can see, sum of column c300 is quite CPU heavy. There are also other consequences of row chaining with connection to table full scan and buffer cache ... which is not as good as you can guess by now.

I would encourage you to check blog post by Jonathan Lewis which covers buffer cache and other important things.

So remember folks ... one does not simply create 300 columns table. You have to have very good reason for that and you have to think about order of columns and consequences.

No comments:

Post a Comment