Staging tables

I've found staging tables to still be necessary (for example, when you have a large dimension table that you can't load into memory. The memory restricted lookup is too slow as it sends a query for each row). I have a few questions:
1. I'd like to create a local temporary table for staging (e.g. ##temp). This doesn't seem to be possible in SSIS as far as I can tell. Anyone get this to work?
2. Failing that, I tried global temporary tables, but that didn't work either.
3. What do you guys use when you need a temporary staging table? Keep in mind that if two people run the same package, I don't want one instance to interfere with the other one.
4. Are there any performance tips I can use to avoid a staging table altogether in this case? I tried lookup caching and it doesn't help at all.

Thanks.

[841 byte] By [JonathanHseu] at [2007-12-17]
# 1

It sounds like your problem is with lookup taking too much memory and you are trying to work around that but the partial and no cache modes are too slow. If this is the case does your data set up in such a way that most of the lookup would normally hit a smallish subset of the full lookup table that is causing your memory problems? If so, then you could create a lookup subset table that contains the subset that gets the most hits. Then you could have 2 lookups in your package. The first would be full cache and load up the subset table. You would then connect the 2nd lookup to the error output of the first lookup and set the first lookup to redirect rows. The 2nd lookup would be configured as no cache (or partial cache if you still feel there is likelyhood of similar hits). This would give you all the performance of a full cache lookup for the parts of your data that hit the subset cache (which is hopefully a large percentage) and you only take the per row hit for the ones that miss the first lookup. Obviously, this works really well in some scenarios but not in others.

HTH,
Matt

MattDavid at 2007-10-6 > top of Msdn Tech,SQL Server,SQL Server Integration Services...
# 2
Nope. The lookup can match any part of a > 100 million table at a random and equal probability.
JonathanHseu at 2007-10-6 > top of Msdn Tech,SQL Server,SQL Server Integration Services...
# 3

Hi Jonathan,

Did you find a good solution to reduce the time taken? I also in such a crisis now. I have to do lookup against various size tables varying from 1m to 800m. The problem is incoming record may hit any record in the lookup table and so I can't take a subset. May be upto 5-10MB, I can use Lookup, but using it beyond this limit is a real time consuming one. Using Partial Cache is do the same thing.

Is there any good alternative method to do that?

Thanks.

Thiru at 2007-10-6 > top of Msdn Tech,SQL Server,SQL Server Integration Services...
# 4

Where your lookup tables exceed the size that makes caching useful or appropriate you can use a number of other techniques.

One is simply to use SQL Server to do the join, perhaps by loading a temp table and joining that in a query with your reference table. This is a classic technique in ELT or push-down optimization scenarios.

Another technique may be to use a merge join in place of the lookup, although that also requires a sort component, which may also be memory bound.

Yet another approach is to identify a subset of the reference table that can be successfully cached - if the business case makes sense. For example, some records may be more likely to be hit than others. In such cases, you may implement a patter involving a cached lookup of most likely candidates, followed by an uncached lookup of records not found on the first pass.

Donald

DonaldFarmer at 2007-10-6 > top of Msdn Tech,SQL Server,SQL Server Integration Services...
# 5
Donald Farmer wrote:

Where your lookup tables exceed the size that makes caching useful or appropriate you can use a number of other techniques.

One is simply to use SQL Server to do the join, perhaps by loading a temp table and joining that in a query with your reference table. This is a classic technique in ELT or push-down optimization scenarios.

I am very interested in the technique you mention. Is there somewhere this is described in detail? I am particularly interested in how to use temp tables (e.g. #temp) in the package.

Thanks.

AlC. at 2007-10-6 > top of Msdn Tech,SQL Server,SQL Server Integration Services...
# 6
We hit the same problem as well. We were on 32 bit servers. We are not only facing hardware limitations but also max 3G memory issue as well.

First I use customized select that only return only the columns that will be used during lookup operation.

2nd, I create extract tables, for example [Extract].[FactOrder], then the lookup task that involve large amount dimension returns a subset of dimension rows via join to the extract table.

TianyuLi at 2007-10-6 > top of Msdn Tech,SQL Server,SQL Server Integration Services...

SQL Server

Site Classified