microsoft decision tree algorithm

I have read some sources about microsoft decision tree algorithm like in claude seidman book, paper about scalable classification over sql databases and paper about learning bayesian network. But i still don't understand and i still didn't get the point on how microsoft decision tree algorithm works exactly when splitting an atribut. Because i have read that microsoft decision tree using Bayesian score to split criteria is it true?

Well, anyone could help me to understand about microsoft decision tree algorithm, please give me details explanation with some example(cases).

thanks for anyone help

[825 byte] By [desny] at [2008-2-27]
# 1

There are links to the appropriate research papers at http://www.sqlserverdatamining.com/DMCommunity/TipsNTricks/986.aspx

Thanks

-Jamie

JamieMacLennan at 2007-9-9 > top of Msdn Tech,SQL Server,Data Mining...
# 2
Hi, there, the first link fails, i am also interested to know how Microsoft Decision Trees splits a continous attribute for derivation of downstream regression models: using information gain or regession approach? thx.
HongqinFan at 2007-9-9 > top of Msdn Tech,SQL Server,Data Mining...
# 3
i'd like to further clarify my previous question for regression tree building: which method is used for splitting a continous variable: information gain or regression approach. The first splitting method (information gain) first bins a continous attribute and turns it into a categorical variable, selects a value which gives highest informtion gain; the second method, according to http://msdn2.microsoft.com/en-us/library/ms175312.aspx, the splitting is performed at the point of non-lineararity(statistical method?). I just wonder which method is used by Microsoft Decision Trees for splitting in the case of a continous variable.
HongqinFan at 2007-9-9 > top of Msdn Tech,SQL Server,Data Mining...
# 4
If the target is a discrete attribute, it uses information gain. For continuous targets, it uses the regression approach
JamieMacLennan at 2007-9-9 > top of Msdn Tech,SQL Server,Data Mining...
# 5
Thank you for your answer, Jamie. but my question is how to split the continous INPUT attribute (The target variable is assumed automatically to be continuous in the case of regression tree).
HongqinFan at 2007-9-9 > top of Msdn Tech,SQL Server,Data Mining...
# 6
I was describing how the INPUT is handled for various OUTPUTs - if you are only considering the case of regression trees, than it is always the regression approach.
JamieMacLennan at 2007-9-9 > top of Msdn Tech,SQL Server,Data Mining...
# 7

Hi, there,

when I use the Microsoft Decision Trees algoirthm to build a regression tree and choose SCORE_METHOD: (1) entropy. :

does the algorithm select the categorical input attribute and split-on by using entropy function? is the continuous input attribute binned into categorical attribute for entropy calculation?

Or this score_method does not act on the regression tree?

Any help is very much appreciated,

Hongqin

HongqinFan at 2007-9-9 > top of Msdn Tech,SQL Server,Data Mining...
# 8
Continuous variables are split by using a internal binning approach that searches the continuous space for the best split point. This is independent of the score method.
JamieMacLennan at 2007-9-9 > top of Msdn Tech,SQL Server,Data Mining...
# 9
thx, Jamie.
HongqinFan at 2007-9-9 > top of Msdn Tech,SQL Server,Data Mining...
# 10

Dear all,

I've tried to change split_method to (2) Complete. It doesn't generate any leaves or trees. It show only top of the tree.

But when I changed it back to (3) Both. The result is ok..but include Binary Split and Complete Split.

By the way, How can i do if I want it to generate only complete split.

Thank you for your answer in advanced.

Nop Vorrasanpisut.

NopV at 2007-9-9 > top of Msdn Tech,SQL Server,Data Mining...

SQL Server

Site Classified