Strange Behavior of LINQ Union Operator

I'm having problems using the LINQ Union operator to emulate a T-SQL UNION query that's often used to demonstrate SQL UNION query syntax:

SELECT c.CityAS City, c.CompanyName,'Customer'AS [Type]

FROM CustomersAS c, SuppliersAS s

WHERE c.City= s.City

UNIONSELECT s.CityAS City, s.CompanyName,'Supplier'

FROM CustomersAS c, SuppliersAS s

WHERE c.City= s.City

ORDERBY City

that returns the following resultset:

Code Snippet

City CompanyName Type

- --

Berlin Alfreds Futterkiste Customer

Berlin Heli Sü?waren GmbH & Co. KG Supplier

London Around the Horn Customer

London B's Beverages Customer

London Consolidated Holdings Customer

London Eastern Connection Customer

London Exotic Liquids Supplier

London North/South Customer

London Seven Seas Imports Customer

Montral Ma Maison Supplier

Montra Mère Paillarde Customer

Pari Aux joyeux ecclsiastiques Supplier

Paris Paris spcialits Customer

Paris Spcialits du monde Customer

Sao Paulo Comrcio Mineiro Customer

Sao Paulo Familia Arquibaldo Customer

Sao Paulo Queen Cozinha Customer

Sao Paulo Refrescos Americanas LTDA Supplier

Sao Paulo Tradi??o Hipermercados Customer

(19 row(s) affected)

Following is the code for a hack to take advantage of the Country field as a temporary holder for Type values (dcNwind is a new instance of a dbml with Customers, Orders, Order_Details, Products, and Suppliers tables):

dcNwind.ObjectTrackingEnabled = false; // don't track objects
var cust = (from c in dcNwind.Customers // get the Customers
join s in dcNwind.Suppliers
on c.City equals s.City
select new { c.City, c.CompanyName, c.Country }).Distinct();
foreach (var c in cust)
{
c.Country = "Customer";
// to verify update
// sbQuery.Append(c.City + "\t" + c.CompanyName + "\t" + c.Country + System.Environment.NewLine);
}
var lstCust = cust.ToList(); // convert to a List (or Array)
cust = null;
var vend = (from c in dcNwind.Customers
join s in dcNwind.Suppliers
on c.City equals s.City
select new { s.City, s.CompanyName, s.Country }).Distinct();
foreach (var v in vend)
{
v.Country = "Supplier";
// to verify update
// sbQuery.Append(v.City + "\t" + v.CompanyName + "\t" + v.Country + System.Environment.NewLine);
}
var lstVend = vend.ToList(); // convert to a List (or Array)
dcNwind.Dispose(); // get rid of the DataContext
var both = lstVend.Union(lstCust); // perform the Union operation
foreach (var u in both)
{
sbQuery.Append(u.City + "\t" + u.CompanyName + "\t" + u.Country + System.Environment.NewLine);
}
var sort = from j in both // sorted version
orderby j.City
select j;
foreach (var u in sort)
{
//sbQuery.Append(u.City + "\t" + u.CompanyName + "\t" + u.Country + System.Environment.NewLine);
}

Regardless of whether I change the data type to an Array to List, applying the Union operator replaces the Country member values with the original country names from the database. The "DLinq Overview for C# Developers" says on page 7:

"To avoid executing multiple times convert the results into any number of standard collection classes.It is easy to convert the results into a List or Array using the Standard Query OperatorsToList() orToArray()."

The dcNwind.Log shows the following query executes four times, no matter how I change the data types, etc.

Code Snippet
SELECT DISTINCT [t0].[City], [t0].[CompanyName], [t0].[Country]
FROM [dbo].[Customers] AS [t0], [dbo].[Suppliers] AS [t1]
WHERE [t0].[City] = [t1].[City]
SqlProvider\AttributedMetaModel

Am I on the wrong track here? I don't want to get into inheritance and a discriminator column at this point. I'm trying to stay as close as possible to the original T-SQL construct.

--rj

[6785 byte] By [RogerJennings] at [2008-1-6]
# 1

You should just be able to write this:

var q = (

from c in dc.Customers

join s in dc.Suppliers on c.City equals s.City

select new { c.City, c.CompanyName, Type = "Customer" })

.Union(

from c in dc.Customers

join s in dc.Suppliers on c.City equals s.City

select new { s.City, s.CompanyName, Type = "Supplier"});

However, there is a bug on how we treat literals in a union. We are fixing it.

MattWarren-MSFT at 2007-9-28 > top of Msdn Tech,Visual Studio Orcas,LINQ Project General...
# 2

I probably should have mentioned that the preceding code compiles and runs, but returns only "Customer" as the type. If you reverse the sequence of the union (Supplier first) all rows are "Supplier".

The sample code I provided as a workaround for the preceding code demonstrates that converting the query result to an Array or List does not prevent multiple executions.

--rj

RogerJennings at 2007-9-28 > top of Msdn Tech,Visual Studio Orcas,LINQ Project General...
# 3

The 4 executions are due to you enumerating the results of each collection twice. Once each in a foreach loop, and once in a ToList() call. If you call ToList() first and only then enumerate the list you'll only execute each query once.

MattWarren-MSFT at 2007-9-28 > top of Msdn Tech,Visual Studio Orcas,LINQ Project General...
# 4

Unfortunately, it's not fixed in the Orcas June 2007 CTP.

See the Orcas and Entity Framework June 2007 CTPs - New Features post's Join Operator Works But Union Method Is Still Broken topic.

--rj

RogerJennings at 2007-9-28 > top of Msdn Tech,Visual Studio Orcas,LINQ Project General...
# 5
The problem was fixed late in Beta 2 cycle. I'm not sure what bits are in this June CTP. It is likely frozen from a few months back.
MattWarren-MSFT at 2007-9-28 > top of Msdn Tech,Visual Studio Orcas,LINQ Project General...
# 6

What am I doing wrong here? Is it me or is this also a LINQ union bug?

var fields = from f in db.MetaDatas

where f.DBTableName == tableName && f.Required == true

select new { fieldName = f.DBFieldName, dataType = f.DBDataType, nullability = f.Required,IsKey = f.PK };

bool testBool = true;

//GET FIELDS FOR PARENT TABLE

//Determine if table name has a "_" in it. If it does then parse parent

if (tableName.IndexOf('_') > 0)

{

string[] parentTables = tableName.Split(new char[] { '_' });

for (int i = 0; i <= parentTables.Length - 2; i++)

{

//we do not need the PK fields in the parent table

var additionalFields = from pk in db.MetaDatas

where pk.DBTableName == parentTablesIdea && pk.PK == testBool

select new { fieldName = pk.DBFieldName, dataType = pk.DBDataType, nullability = pk.Required, IsKey = pk.PK };

fields.Union(additionalFields);

}

int x=0;

}

additionalFields returns data but it is never added to the fields var after the Union. Is this part of the same bug referenced above? THis code used to say && pk.PK == true however I changed it to PK == true based on this post. However, the new rows from second LINQ expression never added to the first.

Any thoughts?

AndyBlum at 2007-9-28 > top of Msdn Tech,Visual Studio Orcas,LINQ Project General...
# 7


The problem Roger was pointing out had to do with literals in the union. The problem you have is just a misunderstanding of the Union operator. Union is a function that returns a new sequence that is the union of the two sequences. It does not modify the sequence on the left hand side.

Try this instead:

fields = fields.Union(additionalFields);

MattWarren-MSFT at 2007-9-28 > top of Msdn Tech,Visual Studio Orcas,LINQ Project General...
# 8

To quote the famous Homer Simpson....Doh!!!

AndyBlum at 2007-9-28 > top of Msdn Tech,Visual Studio Orcas,LINQ Project General...

Visual Studio Orcas

Site Classified