Time Series - Key Time and Missing Value Substitution

1. I am using datetime as the Key Time column and if I leave the parameter Missing_Value_Substitution (MVS) blank, it complains there are missing values. I have stripped the data down to a single sample series which *definitely* has exactly 1 row with each month (e.g. from 2003-01-01, 2003-02-01 ... 2005-07-01) and it still gives this error.

2. What happens if there are multiple Key Time values, e.g.

2003-01-01 Bread 100 Shop1

2003-01-01 Bread 200 Shop2

2003-01-01 Milk 100 Shop1

2003-02-01 Bread 100 Shop1

Where Shop1/2 are treated as Inputs but Bread/Milk are in the Case Key column. I do not want to add the Shop ID to the Case Key (i.e. predict Shop1:Bread etc.) as there would be far too many. Does this make sense or am I misunderstanding how it works?

What I am looking for is any seasonal variations in sales (which months it always peaks in) and the reason I do not want to aggregate the sales per shop is that there are separate patterns in different groups of shops. I get the feeling there may be a chicken and egg situation here, as I really need to group the shops with similar Time Series patterns first and THEN do the Time Series on each group to find the pattern?! Any ideas?

Thanks

[1616 byte] By [NeilMunro] at [2008-2-7]
# 1

Here's a response from our Time Series expert:

When both Key Time and Case key are used, then each input column generates as many time series as there are case key values. All these time series must have aligned time stamps - otherwise you must use MISSING_VALUE SUBSTITUTION
Here are a couple of simple examples

Date (Key Time)

Location (Case Key)

Sales

2/15/2006

Redmond

100

2/16/2006

Bellevue

200

2/16/2006

Redmond

300

2/17/2006

Bellevue

400

2/17/2006

Redmond

500

Date (Key Time)

Location (Case Key)

Sales

2/15/2006

Redmond

100

2/15/2006

Bellevue

200

2/16/2006

Redmond

300

2/16/2006

Bellevue

400

2/17/2006

Redmond

500

Each of these tables generates two time series: Sales.Bellevue and Sales.Redmond

The second table has some missing data that requires the use of MISSING_VALUE_SUBSITUTION

Indeed, Sales.Redmond has the value for 2/17 but Sales.Bellevue does not.

The user must indicate what that means (e.g. - no sales or M_V_S = 0 ; or record lost, but the data can be extrapolated from the previous day , i.e. M_V_S = Previous).

The first table has some missing data too, namely Sales.Bellevue for 2/15, but the lack of this kind of data is forgiven, as it is the piece in far past that is missing. The algorithm may simply truncate the Redmond curve and use the aligned values of 2/16 and 2/17 for the two series.

This set of examples is not exhaustive.

Consider the third table

Date (Key Time)

Location (Case Key)

Sales

2/15/2006

Redmond

100

2/15/2006

Bellevue

200

2/16/2006

Redmond

300

2/17/2006

Bellevue

400

2/17/2006

Redmond

500

Here the data for Sales.Bellevue is missing in the middle of the aligned time segment , i.e. for 2/16/2006.

Again the M_V_S must be defined or the missing row must be supplied.

The bottom line is, the user should be mindful that they may well get (many) more series than there are columns in the data table and all those series need to be aligned (at minimum - right-aligned and without gaps). I believe that when TS complains about missing data, that is for good reason, so either the data needs to be cleaned or MISSING_VALUE_SUBSTITUTION needs to be defined.

RamanIyer at 2007-9-8 > top of Msdn Tech,SQL Server,Data Mining...
# 2

Thank you for the reply, but it is not quite what I mean.

Essentially I have 2 case keys (though one is not being defined as such, because the algorithm doesn't allow it). To use your example, it would be like having Location as the actual Case key, but an additional Store ID column, with 2 stores in Redmond and another 3 in Bellevue. Each of these stores would (usually) have monthly statistics, so there will be multiple rows with the same date for Richmond. Is this permissable or is it going to produce nonsense results?

As I said, I want to look for patterns that may be different in each store.

In relation to the error message, I am sure the series has no missing dates, so perhaps it is a caching issue (I have noticed odd things when I make changes to a db but they seem to be ignored, even if I refresh the view). I will try a very simple sequence to see if it is reproduceable.

Thanks,

Neil.

NeilMunro at 2007-9-8 > top of Msdn Tech,SQL Server,Data Mining...
# 3

In this case, you can make the case key composite by either

a) binding it to multiple KeyColumns (you can do this in the in the Mining Structure Editor in BI Development Studio by going to KeyColumns in the Properties pane for a mining structure column, clicking on the "..." button and adding multiple source columns.

or

b) creating a computed column in your DataSourceView that combines the two source columns and binding to the computed column in your mining structure

In both cases, the case key is a single column from the mining structure's perspective but it's bound to multiple source columns.

RamanIyer at 2007-9-8 > top of Msdn Tech,SQL Server,Data Mining...

SQL Server

Site Classified