Developing a Distributed Workflow

Hi group members

Hope u be doing gr8 with WWF , well did any body try a distibuted (but tightly coupled , as the complete objects/context may also be needed to be passed) Work Flow in WWF . Like couple of activities run on Machine A then Other on Machine B (see figure below )

Machine A Machine B Machine C
Acttivity 1
Activity 2
Activity 3
This way we can delegate time/resource expensive activites to machines (based on certain rule) and yet the workflow gona complete like a normal WWF workflow

any ideas?Idea

[598 byte] By [THERAZI] at [2008-2-15]
# 1

Hi,
I like your idea, but it might be quite challenging to implement it.

If you really need it today I would suggest using Biztalk 2004.
Scaling out and in is right out of the box and for quite a fair price.

YvesLorphelin at 2007-9-9 > top of Msdn Tech,Software Development for Windows Vista,Windows Workflow Foundation...
# 2

Hmm Biztalk i know that commin Smile , well actually i dont need to do it urgently
but actually looking for some practical scnarios where we had to distribute the load over the machines . Hey i am also inclined towards to design a simple platform/framework over WWF for this and install it on wvery machine where we want to execute the activities .... hoe about that Idea

THERAZI at 2007-9-9 > top of Msdn Tech,Software Development for Windows Vista,Windows Workflow Foundation...
# 3
Hi Razi,
I've been working on a framework to support a distributed computing workflow for the past 6 months. We are currently evalutating WinWF and will determine if we should migrate the framework to WinWF.

distributed workflows are extremely powerful concepts. Especially if you have a library of components (Acticities) you can wire into your workflows.

LaVinci

LaVinci at 2007-9-9 > top of Msdn Tech,Software Development for Windows Vista,Windows Workflow Foundation...
# 4
I was sure you would see it coming ;)

Anyway, If I were to try right now, I would probably examine How and When a process instance state is persisted , and have multiple process host try to save & instantiate process instances.

With my current limited knowledge of WWF, the only way to control programatically the state persitence is to call
WorkflowInstance.UnLoad() ; to save it
WorkFlowRuntime.GetWorkflow( Guid id); to reload it.

There is also a property on the workflowRuntime : UnloadOnIdle it controls wether the instance is to be unloaded from memory when it is Idle.

You could then try to have your runtime host save your state automatically when you reach certain activities; and have another runtime host process pick it up.

But then you should be careful with transaction scopes...

If you want load balancing I would have the processes instances load balanced not the activities.
Having instances running on different host can serve as failover.

If a particular set of activities does need to run on a different host I would probably factor out those activities. Put them in a different workflow definition running in a different host and have it called by the main workflow

I think it is not trivial to have specific activities run on different host and I really fail to see why, could you provide an example?

YvesLorphelin at 2007-9-9 > top of Msdn Tech,Software Development for Windows Vista,Windows Workflow Foundation...
# 5
I agree with Yves and it should be possible to build a host for Windows Workflow Foundation that spans multiple machines and provides load balancing. Windows Workflow Foundation is a developer platform and not a server product and doesn't have a server that supports this out of the box.

I'd suggest that you probably should write your own persistence service to do this and let the workflow runtime UnloadOnIdle when it chooses. I'd be very interested in hearing from someone who has tried this and wants to share. Contact me on my blog.

PaulAndrew at 2007-9-9 > top of Msdn Tech,Software Development for Windows Vista,Windows Workflow Foundation...
# 6

Although this is purely an application owned issue, heres some ideas of how to do distributed workflows.

1. Use a shared database for the persistence store, so that you can start a workflow on one machine, unload it , shutdown the workflowruntime on that machine. bring up a workflow runtime on another machine and then load the same instance on that machine using a WorkflowRuntime.GetWorkflow() ( somehow transfer the GUID to the the second machine ) and continue running..

2.Use suspend activities as points in which you can request an unload.

3.Use different AppDomains ( for hosting different WorkflowRuntimes ) and differetn database stores to manage instances of such distributed workflows.

Im just throwing Ideas out , I would be glad to hear from anyone who tries this out.

# 7
What is the intent/objectives for wanting distribute the workflow on an "activity" basis?

I'm not suggesting it is a bad idea ...I'm curious about the specific reasons.

For example, there's extremely low overhead executing the core workflow in WWF ...so is it some additional per-activity processing that you want to distribute over multiple servers (over and above the core workflow itself)?

Michael.

# 8
Some computational problems don't fit within the space of a single processor. You simply run out of 'wall clock' time.

Example of this might be collecting programming information from all your customers set-top cable boxes, transforming the data in some way and then loading into your favororite data base for analysis. I'm sure 'the' marketing dept. would want all this information ASAP, lets say we are in the middle of the World Series and I'm trying to sell more ads.

So creating a workflow that scales out is one way to handle this I suppose.

I'd like to be able to create a workflow, and as an after thought make it parallel. That is, I would like to assign 'HOT' flows ( groups of activities that are processsor bound) to one or more systems. yes, this is more complicated to manage and develop, but it would be nice to have support in the native framework.

Using this type of distribution we can perform very large processing jobs.

As another example, lets say I want to perform real-time click stream analysis ... on a very active web search engine.

LaVinci.

LaVinci at 2007-9-9 > top of Msdn Tech,Software Development for Windows Vista,Windows Workflow Foundation...
# 9

What I was getting to is there is a difference between "distributing the work" across multiple servers and "distributing the workflow" across multiple servers.

Razi: What is/are the scenario(s) you had in mind?

Michael.

# 10

Why not use the usual remoting techniques to get to each Activity? (e.g. queues/remoting/Indigo/WebServices?)

I personally would use BTS 2004/2006 to provide a unified Orchestration to manage the whole lot (especially in a distributed environment).

BTS is more than up to this task - it's built for these scenarios so you can get on with writing the actual workflow and not spend so a huge effort in the 'distributed' architecture.

Cheers,

Mick.

MickBadran at 2007-9-9 > top of Msdn Tech,Software Development for Windows Vista,Windows Workflow Foundation...

Software Development for Windows Vista

Site Classified