Wolfpack, v3 - your input appreciated!

Coordinator
Oct 9, 2012 at 6:42 AM
Edited Oct 10, 2012 at 8:36 AM

This discussion is designed to help shape the next major release of Wolfpack and I would love to hear your ideas, gripes, thoughts on what it should consist of.

I have some core thoughts on it and will detail them here over the next few weeks, suffice to say the main thinking behind this release is "back to basics". I want to fragment the "core" and slim it down, making some of the components optional installs, some of them will be culled or replaced with enhanced versions. The Geckoboard Activity and the communications components (WCF and NServiceBus) will get the majority of the attention.

Moving out of core (becoming optional installs via contrib project)

  • Bus (NServiceBus) (possibly remove altogether)
  • Growl
  • Geckoboard
  • LogParser
  • Owl
  • HomeAutomation
  • BuildAnalytics
  • AppStats

Replacing

  • WCF Publisher -> replace with ServiceStack api to support offline mode and security and razor mvc pages to show configuration and recent alerts etc

Including in Core (as dependent NuGet packages when installed via chocolatey)

  • Contrib.Console
  • Contrib.Email

Over to you...thoughts, ideas on what you would like to see in v3?

Cheers,

James

Oct 9, 2012 at 9:00 PM

A big issue I am trying to deal with is log management. Not just monitoring for certain criteria, but consolidating and making logs available to other departments. I think a Log Tail Activity (Event Logs, text file, etc) paired with a Publish (SQL, Loggly, HTTP Post) would facilitate that.

I think one feature that's lacking attention in the current version is Alert management. I currently have Wolfpack watching about 100 web URLs every 90 seconds. If the host running those tests loses connectivity or something and all those alert every 90 seconds, my inbox is going to explode. Most monitoring solutions have some escalation functionality and at minimum an option to notify on change of status only, not every failure. It might be possible to work something like that into 2.x, but I think it would be nearly impossible to manage.

Finally, this may be a separate project, but I've been thinking of employing the PowerShell DSL solution Doug Finke (http://blogs.technet.com/b/heyscriptingguy/archive/2011/02/17/use-powershell-to-simplify-working-with-xml-data.aspx) has described for building and maintaining the Wolfpack XML config files.

Coordinator
Oct 9, 2012 at 9:33 PM

Hi, thanks for the feedback and these are very good points.

1) Alert management - yup, this is a serious problem in the current version. There is a new feature called "Notification Modes" that is code complete and already committed to the source repo and it will be released in v2.5 next. Essentially this allows you to inject logic into the alert path and is extensible so you can drop in your own logic in to control things too. I am in the process of replacing the old "PublishOnlyIfFailure" config switch (bool) that many checks have with "NotificationMode" (string) and this setting connects you to the associated NotificationMode plugin. Check out Core\Filters\Notification in the source - there are plugins for "FailureOnly", "SuccessOnly", "StateChange", "StateChangeNag". StateChange will only generate an alert when the state flips from one to another, even if it remains in a failed state, the "Nag" version will continue to send alerts if the state remains failed and even allows you to define the interval they appear. At the moment I have converted only one check to use NotificationMode, WmiProcessRunningCheck but am busy doing the rest as we speak.

2) Config...yes, always a pain and on the drawing board are plans for some sort of configuration UI app. I am also thinking of making the configuration independent of the IoC container (it's castle config xml) so it can be make much simpler. I think this would be a vital step so that alternative configuration mechanisms could be used. I'll take a look at the link as I am keen to improve this area and this could be a quick win.

Cheers, appreciate your input!

James

Coordinator
Oct 10, 2012 at 8:34 AM

Regarding the consolidated/tail log feature - so it periodically runs a logparser query capturing aggregated data over some range (last n/timespan) and pushes an alert if any rows are returned - this alert also contains some identifier/link to allow you to reconnect with the data later. The actual data captured by the query is stored and referenced by the identifier embedded in the alert.

To view the data it could accessed as a stream via a web api method or a simple html viewer page. Finally I guess there would need to be some management api to allow deletion of stored data or some periodic cleanup plugin executing a retention policy.

Is that getting close to what you need?

Oct 10, 2012 at 1:57 PM

I guess to be more clear I should have just said I would love to have LogStash like functionality in Wolfpack.

What you described is pretty much it, but having a dependency on LogParser limits what log formats you can deal with. For example, it's impossible to consume a Log4Net multi-line log entry with a stack-trace. I guess if there was a LogReader base class that you could build on and use LogParser where it works, that could work.

It might make this overly complicated, but Check plugins that watch this stream could help Woflpack not double-dip into the source logs to read the rows and do a scalar check.

I think a cleanup plugin would be sufficient for managing the data growth, but personally, I would probably pump the data into some log management application like Loggly.com that would deal with the data for me. I've just found that most log collection applications/plugins suck for Windows hosts because they're usually ported from some *nix platform.

In the end, what I'm looking for is an easy way to aggregate logs from different machines into an archive that I could mine for business intelligence. 

Oct 19, 2012 at 5:19 PM

So, I think this is a pretty cool project.  It has a lot of interesting capabilities.  The current release, to me, has something of an identity crisis.  When I think of an agent, I think of something that specializes in local knowledge, i.e. running processes, file systems information, local performance metrics.   Wolfpack does this, but many of its health checks can allow an agent to know things about its environment that definitely aren't local.  I could use the database check to monitor a local SQL table, but I can also use it to monitor a remote SQL table.   That's not a capability I typically associate with an agent.  Agents typically report in to a "home base" that provides things like aggregation, administration, and the larger system or enterprise knowledge that an agent lacks.    As I read Wolfpack it really doesn't structure itself heirarchically, but in a more peer to peer fashion.   Again, that flexibility is appealing, but its going to be hard to grow in both those directions at once.

One of the most intriguing things about the solution to me is that the generic nature of the health checks makes it possible to extend the monitoring concepts to business "health" as well as IT "health".  Particularly for smaller organization with SQL based ERP systems (or larger organizations with relatively inflexible SQL based ERP systems) Wolfpack can detect problematic business conditions and surface them for proactive attention.  For example, "did we just receive an invoice for a service that exceeded our estimate on the price?"  Or if you have legacy application that creates data integrity issues (i.e orphan headers or orphan lines), you could detect those issues and address them before they screw up your end of month processing.  

To be useful in this regard, Wolfpack would need to be much "smarter" in terms of how it lets you process events.  The Growl interface is very nice, but it assumes a one to one relationship between Agents and Growl Instances.  It would be nice if I could set it up so that multiple instances could be alerted.  But that's just the tip of the iceberg.  If we're going to create a rich landscape of alerts, we probably want a way to set up subscriptions and maybe even rules to help further define what gets fully surfaced and what just happens.

Of course, it's perfectly legitimate to say that this is a good idea, but not the scope of the product, but rather the scope of some other related product.  If that's the case, I'd keep the publishers pretty simple.  This would define the role of the agent as something that aggregates and standardizes information from different sources, but is really unconcerned with if or how the information is surfaced, beyond providing some hooks into mechanisms (dbs, buses, etc.) that are commonly used to preserve information.

Thanks for the opportunity to chime in....

Coordinator
Oct 24, 2012 at 9:59 PM

@rgarch - firstly, thanks for the input...really appreciated...and there is a lot to digest here with some very valid points. To address the identity crisis (of which I agree it has to a degree) I want to strip Wolfpack back and provide plugins that better delineate the roles that Wolfpack can play.

There will certainly be better support for configuration (GUI and an API) and an extensible web api to interact with instances and for data exchange - this would also support visualisation plugins. I already have a demo SignalR plugin that broadcasts alerts/messages and this will become the default message distribution mechanism - this offers a lot of rich possibilities for visualisation and communication.

I will also be making a much better effort with the "AppStats" feature - this is the feature that currently allows you to monitor "business health

BTW: Regarding the growl interface "It would be nice if I could set it up so that multiple instances could be alerted" - you can forward growl notifications onto as many (LAN) connected PC's (also running growl) as you like and set up rules for which message types go where...you can also forward to your (or others!) smartphones too.

Thanks,

James

Coordinator
Oct 30, 2012 at 6:09 PM

How about IFTTT integration? Basically Wolfpack would need to publish alerts via Dropbox or RSS - these act as triggers to IFTTT actions - eg: Wolfpack Dropbox publisher uploads the alert (serialised as JSON/XML/Text whatever) and you then create a recipe to do something with it!

Just a thought....!

Coordinator
Dec 20, 2012 at 10:43 AM
Edited Dec 20, 2012 at 5:39 PM

Ok - an update on v3...it's going well!

I hadn't intended for this to be such a big change internally but alas the best laid plans etc....so far I have....

  • Revamped the publishing system. 
    • On the publisher side I have consolidated the message type into a universal format "NotificationEvent" - this means you only need to implement a single publisher type (as opposed to the existing AgentStart and HealthCheckResult). 
    • Information to be published is strongly typed and is interchangeable with a NotificationEvent (via an INotificationEventCoreProperties interface). This means that NoSql stores like Mongo/Raven can simply store the message itself.
    • Messaging is now extensible - you can easily publish ANY type of message via the same publishers.
  • New NotificationHub component provides an extensible/plug-in based "hub" that all HealthCheck alerts are routed through. 
    • The plug-in system allows for rich logic to be injected to decide whether a healthcheck alert should be published.
    • There are plug-ins to cover the main filtering use-cases - eg: only publish alerts from a healthcheck based on its result.
    • Sophisticated plug-ins to shape and control a HealthChecks alert output
      • Alerts can be set to publish only on a state change
      • Alerts rate can be throttled (so you don't flood your publishers with the same alert). Custom throttle maps also possible by implementing your own plug-in.
  • Simplified the growl publisher
    • Replaced Growl "Finalisers" with a simplified "Formatters" implementation.
    • Fully separated the formatting logic from the publisher into the new formatters.
  • New Artifacts feature - support for large datasets to be associated to a notification.
    • HealthChecks can now generate large datasets that support the Notification and this data can be persisted independently - eg: LogParser check can now create an artifact out of the actual report data that it created to generate the notification.
    • The notification can contain link to download the artifact (via new WebService API call, below)
    • Plug-in architecture allows you to provide custom artifact repositories eg: database)
      • Currently supported is a "FileSystem" artifact repository - artifacts are saved to disk file and indexed by Notification Id (guid).
      • Plug-ins to serialise/deserialise the data to/from CSV and JSON format exist. 
  • New WebService API (based on ServiceStack)
    • Core Wolfpack API provides methods to...
      • Get Wolfpack Status (see what plug-ins are loaded, unhealthy, when it started etc)
      • Publish NotificationEvents to (when agent is acting a server in a distributed layout)
      • Download Artifacts (just provide the Notification Id)
      • ATOM feed of alerts (work in progress)
    • Razor views supported
    • Can be secured using HTTPS (and credentials)
    • Receive Notification method has an extensible plug-in pipeline architecture
      • Plug-ins for de-duplicating messages and handling message staleness
      • You can create custom plug-ins to enrich this "message received" pipeline
      • Client component to publish to the webservice is also plug-in based & robust - handles "no connection" to server and the entire "strategy" of how messages are delivered to the server agent can be customised.
    • Extensible - you can create your own ServiceStack service plug-ins and Razor views and these will operate alongside the core Wolfpack API.
      • eg: Build custom dashboards in HTML :)
  • Removed several projects and features from "Core" Wolfpack solution.
    • LogParser project relocated to wolfpackcontrib (optional download).
    • Geckoboard project relocated to wolfpackcontrib (optional download).
    • AppStats project (possible relocation to wolfpackcontrib)
    • BuildAnalytics project relocated to wolfpackcontrib (optional download).
    • HomeAutomation and Owl projects relocated to wolfpackcontrib (optional download).
    • WCF Publisher/Bridge dropped (in favour of the new WebService api/publisher)
    • NServiceBus support has been dropped as a publisher (in favour of the new WebService api/publisher).
      • Support for NServiceBus orientated checks remains in the wolfpackcontrib project
  • I've also started work on a project to extract all the Geckoboard related code into a new project that will provide a general purpose .Net api to Geckoboard and I'll be creating a new Wolfpack contrib project to host all the Geckoboard Wolfpack functionality.
    • Support for Geckoboard data push will be included

 

As this is a pretty huge change to the internal workings I am going through a stabilisation period - then it will be on to fixing up all the contrib plug-ins (and associated documentation) - a priority is also completing the (alpha) SignalR plug-in to broadcast notifications to all connected clients.

More updates soon...!

Dec 20, 2012 at 3:51 PM

Fantastic stuff.

Given the expansion in customization, did you make any progress on config management? How much customization can be manipulated through config files versus writing compiled plug-ins?

Do you feel like you're nearing a beta stage? I'd like to start getting my head around this new architecture.

Coordinator
Dec 20, 2012 at 5:22 PM

One of the items on the todo list (https://trello.com/b/8gto1kYr) is to diagram the internal architecture - this should clearly illustrate the internal flow and extension/interception/customisation points available.

Should have a stable build in the next day or two and would welcome some additional testing.

On the config front I have some ideas but nothing concrete yet - I'm split between static configuration (eg: UI to change config files, restart the app) and dynamic - building in a mechanism to push config to a live running service (no restart required) - pro's and con's on both sides....and no obviously better approach so opinions/options welcome on this!

Coordinator
Jan 4, 2013 at 4:11 PM

Another quick update,

The core notification/messaging changes are complete and running well. Further simplifications have been possible so that ALL notifications (health checks, agent start) are routed via the NotificationHub component - this ensures a consistent route from creating a notification to its actual delivery to the publisher plug-ins and means you can attach custom plugins to intercept the message and run any logic you like to determine whether the message should actually be published.

I have also started work on a configuration feature. Highlights are...

  • Abstraction of the configuration definition - a plugin/repository pattern is used to load configuration definitions (currently file-system supported)...but this means you can store your configuration in any format with any persistence mechanism (eg: DB, spreadsheet).
  • Web API to interact/control the configuration - create, modify, remove configuration via JSON/REST calls.
  • Configuration Catalogue - plugins can advertise their configuration, these allow you to create new instances from these "template" configurations.

More to follow....

Coordinator
Jan 29, 2013 at 2:12 PM

I've got most of the major new features working now - there is a lot of tidy up to do on the config side of things - essentially porting all the existing checks/publishers etc.

One major new feature is a SignalR activity - alerts are "realtime" broadcast to all connected clients...here is a screenshot of how a notification looks via the Activity tab in the new Wolfpack Web Interface (displayed on an iPhone)

https://twitter.com/followjimbobdog/status/296000538946961408/photo/1

...and one of the Status screen...

https://twitter.com/followjimbobdog/status/294920434288312320/photo/1

I hope to get this into a decent enough shape for checking into the repository this weekend.

Jan 29, 2013 at 9:33 PM

Images look great, I really welcome another way to interact with the agent / alerts. Using Growl works out OK and enables one to pull things together and even forward to another system, but it also requires that you have a user logged in to that system weather you are there in front of the term or not. Not an optimal setup.

Also I am really looking forward to revamping my monitoring setup using the new features around alert throttling and logic.

The config interface/UI sounds really promising. I am hoping I am reading it as you intended. Sounds like the UI will be able to take the plugin that I build and understand the UI elements needed to configure it. Of course my plugin has to do a good job of offering up the definitions as well. This will put the pressure on the plugin developer to define the config well.

Your efforts are much appreciated, Keep sloc-in away.

Coordinator
Jan 29, 2013 at 10:15 PM

Hi,

Yeh - the new NotificationHub addresses a long term weakness of Wolfpack - it allows for all messages to be intercepted and plugins can decide whether the message should actually be published or not - the built in ones should take care of most scenarios but you can roll-you own to customise the logic exactly.

The config/ui is built on top of a ServiceStack api - the idea being that others can provide alternative implementations or you can even extend the api or web pages (think dashboard, graphs etc). It sounds like you have the right end of the stick though - you can provide an implementation of ISupportConfigurationDiscovery that provides a descriptor entity that effectively "advertises" your plugin configuration and makes it visible to the UI/api. The bit I am actually working on at the moment is the UI for creating and editing new plugin instances.

The biggest challenge will be refreshing all the docs!

Cheers,

James

Coordinator
Feb 25, 2013 at 8:51 PM
A first cut of the new core source has been uploaded to the repository and I've started to put some documentation together here: http://wolfpack.codeplex.com/wikipage?title=Wolfpackv3Docs

As I progress further with it I'll create more "Alpha Release Notes" so you can experiment with the source.