Typed DataSet's with Large Amounts of Data
Is there any documented maximum number of tables, relations, or records in a typed DataSet object in .Net 2.0?
I have a large DataSet (the select command returns 24 result sets with tens of thousands of records total (approximately 90K records of varying sizes.)) If I fill from a DataAdapter into an untyped DataSet, the fill operation returns in just over 1 minute (approximately the time of the select procedure.) If I then merge this untyped DataSet into my typed data set (after calling BeginLoadData on each table) with constraints turned off (EnforceConstraints = false) the merge takes approximately 20 minutes. If I then enable constraints, the program runs for slightly over 20 minutes, and returns an Out of Memory exception.
I expect that this relates to the fact that this DataSet is VERY hierarchical in nature, and the relationship structure, while a straightforward tree, is non the less very deep.
Can anyone shed any light on ways in which this behavior can be controlled, short of dismantling the DataSet? At this stage of the product, that would be a nearly impossible proposition...
I understand when a large untyped Dataset is merged into a TypedDataset with compatible or same schema the timings are as follows:
1. With constraints turned off, merge takes approx 20 minutes.
2. Enabling of Constraints on the merged typed-dataset, after 20 minutes it returns an out-of-memory exception.
We will explore this problem further, it'd help if you can give an indication of the complexity of schema, like the number of tables and rows participating in the deepest hierarchy in this specific dataset. The number of unqiue, Primarykey, AllobDbNull=false, expression columns, DataViews and other constraints.
Some workarounds:
1. If you have expression columns in the merged TypedDataset, it might help to reset the expression columns to an empty string on the underlying untyped-Dataset (TypedDataset inherits from unTypedDataset, so do the tables ...). Post merge and constraint enforcement turned on, restore the expression on these columns.
Please note that Merge automaitcally turns off constraints and you are not expected to turn it off and on, it should happen automatically - though, I do understand why you are doing it, it helps understand the problem.
2. Another alternative (for .NEt v2.0), trying using DataSet.Load (or DataTable.Load)
The untyped DataSet in this case, is freshly filled from the Database, am assuming it does not have an "changed" rows. In that case a more efficent option in .NET v2.0 is to use:
typedDataSet.Load(untypedDataSet.CreateDataReader(dataTableList), LoadOption.OverwriteChanges, dataTableList);
where: dataTableList is the list of the required DataTables and specifies the order of results in DataReader and also informs Load of the expected order,
Hope this is helpful.
Thanks,
Kawarjit Bedi
kbedi@microsoft.com
ADO.NET Program Manager
Both of your workarounds have merit. I definitely have expression columns in the hierarchy (not many columns, but on the most heavily populated tables), I have the luxury of this being a query coming from the Database side without changes, and I am using .NET 2.0.
I will check with my management with regard to sharing the source (At least the XSD's) with you. Can I share them by EMail rather than on the forum?
Regardless I will let you know on the workarounds.
Andrew,
yes, please feel free to share the schema over email, please make sure that there is no confidential/proprietery information. Forthe purposeof analyzing the problem, I need the structure, you may want to scrub the element/attribute names.
Thanks,
Kawarjit Bedi
kbedi@microsoft.com
I've gotten approval to send you the schema. I'm working on building a test harness for it now, and then I'll send it across (I hope you can take large attachments...)
The first of your workarounds still left me with an out of memory exception on turning constraints back on. I'm working on checking your second one. So far, it's showing promise.
I am in a similar situation but my schema has only two tables. I load data directly into typed dataset using ReadXML with IgnoreSchema option. It works just fine.
I am loading about 100K - 150K rows and 75 columns combined in 3 datasets in my winforms app. Its quite a memory hough. I wonder if there is a way to improve memory footprint of the app. Column data types are string and decimal.
Is it possible to calculate dataset object memory based on columns data types and number of rows.
Thanks,
-Rajeev
BackGround:
DataRow is a wrapper over records (current, proposed and original) the data for which is stored in DataColumns. The storage for each DataColumn is backed by a strongly typed array. The actual column data is stored in this array, one per column.
One simple way to calculate expected memory:
Assuming one DataRecord per DataRow (true for unchanged, deleted or added DataRows) the memory footprint can be estimated using the following:
- A DataRow consumes 100 bytes, so if table has 10 rows, it's 1K.
- Each column is backed by an array of the column's type. For most types, we can assume an average of 4 bytes per record. For string columns, the size depends on the value stored in each column.
The worst case: If all rows are in modified state then each DataRow is associated with both current and original records, i.e. 2 records per DataRow, the size of the array backing the column storage can be twice the number of rows. DataTable manages the usage and growing of the array backing the column-storage, the array size never shrinks, it grows by doubling. In the worst case, a just doubled datacolumn storage array may result in space allocated for 2 records per DataRow for twice the number of existing DataRows.
Tuning for memory footprint
DataTable.MinimumCapacity may be set to the maximum number of expected records. This means the DataTable would never have to reallocate or double it's column-value storage. Teh default value for MinimumCapacity is 50.
Computed Memory used by other associated Objects:
The above computation helps commpute the memory footprint of a DataRow. There are other fixed memory overheads like memory used by DataTable object, DataColumn, DataRelation, etc. These are constant for a DataTable and don't change with the number of rows.
Constraints:
Constraints like PrimaryKey, ForeignKey, UniqueKey, AllowDbNull=false, DataRelations and DataViews are backed by an index which consumes approx 32 bytes per entry. For instance if a DataTable with PrimaryKey has 100 rows, then the index backing PrimaryKey will take 32*100 = 3200 bytes. One good way to reduce memory consumption is to have just the must-have constraints and remove other constraints.
Hope it helps.
Thanks,
Kawarjit Bedi
Hi Kawarjit,
I have similar situation but nothing to do with typed vs untyped dataset. I have some 70k records in a table which servce as a lookup date for some of my methods. This table is part of Informix Dynamic Server and we are using IDS client SDK for .Net 1.1 framework to connect to IDS database. So my question is would it fine to load all the data into Dataset in memory and use that for lookup locally or can we query against the database everytime we need to do lookup. Approximately we will be doing this look up for 50k times. The size of XML after writting the 70k records into XML file through dataset is 12 MB. I don't know what will be the size of this data in dataset in memory.
Also we would like to understand what is the amount of maximum memory that an application can use and what is the amount of memory usage that is said to be optimum. In the current application we will have around 30 MB of memory usage and we would like to know if this is ok.
Thanks,
Venkatesh