Lots of data/Performance Question
Which is better for performance, 1 large table, or a collection of smaller tables, all identical to each other, except by name?
Ok, let's say I have a users table (id, firstname, lastname, etc...) and a single master "downloads" table, containing a record for every download recorded on a website. In the master downloads table, you would have the following:
DownloadID (Primary Key)
UserID (Foreign Key to UsersTable)
FileDownloaded (name of the file)
DownloadDateTime (date/time of the download)
I could alternately set up a new table for each user, named something like "Downloads_<UserID>". The structure would be:
DownloadID (Primary Key)
FileDownloaded (name of the file)
DownloadDateTime (date/time of the download)
I'm wondering which is better for performance (insert and select queries mainly). I'm expecting a lot of records, so my database should grow in size rather quickly. If I had all the data in a single table, and something becomes corrupt, data for all users is affected, whereas that won't happen if each user has their own table.
On the other hand, if I go the multiple-table route, then my stored procedures all get tougher to manage, and less efficient, since I now have to pass a table name (or userID) in as a parameter. Can someone help? Which is the better performance tactic?
[1389 byte] By [
jobo5432] at [2007-12-24]
Yeah, the management aspect of a series of tables as opposed to a single master table is a B****. I know that to keep my data in BNF (normalized) I would use one table. I'd put an index on the foreign key (UserID) for faster searching, but when I say a lot of data, I'm talking about 10,000 - 12,000 rows/day or more, which puts me into the millions of records within 5-6 months. I need to keep the data accessible for 6 months (for reporting), so I can't trigger and dump old records unless they're older than 6 months.
The alternative method allows me separate the data by user, so the size of all combined records are spread out somewhat. I would have to use dynamic SQL virtually everywhere, which is terrible, and doesn't give me too much benefit. Managing is tougher too, since I have all these tables in the db. So if I make a schema change, I now have to change n instances instead of one. It also prevents me from using O/R mapping or a code generator in my C# code, which means 2x the work for me. It's definitely the uglier of the two options I have, but I'm more worried about users accessing their data correctly than I am about the burdens of coding and managing the server.
I think I'm going to take my chances with a master table. There's really no way of knowing how performance will be until I try it anyhow. If it's not working, I'll implement the other method.