This is Alan R.'s Typepad Profile.
Join Typepad and start following Alan R.'s activity
Alan R.
Colorado
Long-time geek and founder of Assimilation and Linux-HA projects with interests in managing computers, particularly monitoring, discovery and availability.
Recent Activity
Something I should have made a note of - it automatically configures _monitoring_, not _alerting_. It's much harder to figure out how to alert for a particular service than it is to figure out how to monitor it. Monitoring is apolitical and mostly independent of company, but alerting is not. Who to contact, when to contact them, what's the priority of this service? All good alerting questions. But you don't need to know the answers to these questions to monitor it.
Rules to automatically monitor servers using init scripts
From a monitoring perspective, one of the most exciting possibilities in the Assimilation project comes from the integration of monitoring and discovery. We've recently implemented the rules which will cause services to be automatically monitored once they're discovered. In other words, you don...
I'm planning on taking a good bit of time off to make the first Assimilation release available. I'd describe its current state as a well-established proof-of-concept. Hopefully in the time I have before the end of the year, I can make it into a release worth having others try out. Feel free to play with it as is.
Zero Configuration Discovery and Server Monitoring in the Assimilation Monitoring Project
Wouldn't it be wonderful if you could just drop a monitoring package onto your servers with no configuration at all, and it all started up, began monitoring your servers, discovered your services, dependencies and switch port connections all without you doing anything - with practically no load ...
Alan R. added a favorite at New DivaBlog
Oct 20, 2012
Doing this right is definitely a pain in the you-know-what. It will require:
(0) Renumbering all your IP addresses of your switches as noted above
leaving room for all your servers AND their virtual IPs as well as fixed IPs
(1) Enabling LLDP or CDP in your network
(2) Writing a piece of software to intercept/read the incoming LLDP or CDP packets
on all the interfaces you expect CDP or LLDP on
(3) Integrating this CDP/LLDP reader into the bootup process to assign IP addresses
as the packets arrive, and terminate when all the requested interfaces are assigned
(4) Do a LOT of testing to make this all work.
Obviously, you'd want to start with a small test network to use in creating and testing this software.
Regarding Disaster Recovery - that's a complicated issue - not particularly related to cloud computing. It is unlikely that in most cloud computing scenarios that you'd have _any_ influence over IP address assignments.
Maintainable IP address assignment for clouds and large clusters
This post describes a more maintainable method than normal DHCP for automatically assigning host names and IP addresses to servers which is ideal for cloud computing and large clusters. It assigns them according to the location of the server, and requires zero administrative effort when you add...
Thanks! I fixed it. Not sure how that snuck in...
Assimilation Monitoring LinuxCon Video
I mentioned a few weeks ago that my talk at LinuxCon in San Diego had been very well received. Thanks to some good friends, we also created a video of the event, and this week I want to point you to the final cut of that video. This talk is a great introduction to the Assimilation Monitoring P...
My code in the Assimilation project monitoring system knows how to decode LLDP and CDP packets, and is lightweight, and doesn't enable promiscuous mode on Linux. What you really want is to listen on all your interfaces at once, and enable each as the packet(s) come in. LLDP has this annoying property that the frame length is split across two bytes, taking one bit from the frame type.
Maintainable IP address assignment for clouds and large clusters
This post describes a more maintainable method than normal DHCP for automatically assigning host names and IP addresses to servers which is ideal for cloud computing and large clusters. It assigns them according to the location of the server, and requires zero administrative effort when you add...
This is good information to know. I don't currently expect for my problem domain to be creating large numbers of nodes of the same type in a short period of time on a routine basis. That would imply that lots of new hardware showed up all at once - which is likely to only occur during initial installation. But I much appreciate your expert opinion, and will keep your advice in mind as we go forward. Thanks Much!
An Assimilation type schema in Neo4j
This week I want to talk about an aspect of the Assimilation database schema which is somewhat controversial, an aspect of the schema for which the jury is still out. I chose to represent the Assimilation node type hierarchy with relationships which currently serve no purpose other than to repre...
Hi Luannem,
Thanks for your comments! Sorry I was so slow to respond to it :-(.
If you hang in there, you'll get more posts like this. I'm currently making about one a week - and I have at least 3 more weeks (after today) of posts I know what to say.
Because of your interests, maybe you should join the Assimilation mailing list: http://lists.community.tummy.com/cgi-bin/mailman/listinfo/assimilation
Managing Computer Systems with Dependency Information
I haven't written much lately - because I've been quite busy writing tons of code for the new Assimilation Monitoring Project - which will start showing up here in my blog much more frequently - beginning with this post. From the perspective of managing a set of applications, a single tenant dat...
Thanks Peter! I have more articles like this lined up. I'll post one of them tonight - tomorrow morning your time ;-).
I really wasn't sure how to solve this problem - but when I came up with the idea for "variable relationship names" - it seemed to be a good compromise. I rarely want to know all the memberships, and often want to know about specific memberships - so this seemed like a reasonable approach for how to get make things easy and maximum advantage from Neo4j.
Assimilation Ring Neo4j Schema
In this exciting episode of the Assimilation Project Neo4j schema series, I'll go over how the monitoring ring structures (which are key to its scalability) are represented in Neo4j. This is in some ways pretty simple, and some ways kind of clever. Either way it is one of the more critical par...
Thinking about this design - it seems to me it's more like DHCP than any other internet protocol that I can think of, since it's centrally managed and has other things in common. For example...
When we boot up, we send a multicast/broadcast packet asking for someone to tell us who we are and how we should be configured (analogous to getting DNS entries and so on from DHCP). Like DHCP clients, our machines "renew their leases" periodically - except we do it a *lot* more often. Instead of measuring lease renewal times in minutes, hours or days, we measure them in seconds (or potentially even in fractions of seconds). To compensate for this, we distribute the our "dhcp server analog" throughout the network.
Really Big Clusters: A Scalable membership proposal
This blog entry is a bit different than previous entries - I'm proposing some enhanced capabilities to go with the LRM and friends from the Linux-HA project. I will update this entry on an ongoing basis to match my current thinking about this proposal. This post outlines a proposed server live...
That's an interesting (and thoughtful) comment. This doesn't look very much like Linux-HA (or Pacemaker) - and it's a long way from a complete solution. An OSPF network with 10K routers is a very big OSPF network. (not 10K hosts, or 10K switches - they don't participate in OSPF).
What specific way do you think it should look more like OSPF?
Here are my off-the-cuff thoughts on this question...
How is this problem like OSPF? - it is trying to manage liveness, it is trying to be local network topology aware and network efficient.
How is this problem different from OSPF? It's not trying to solve the "let's help independent fiefdoms work together" problem. At this level, all machines are "owned" by the same owner. It is not trying to provide anything more than liveness (it's trying to solve a simpler problem). There is no distributed control (at this level).
Really Big Clusters: A Scalable membership proposal
This blog entry is a bit different than previous entries - I'm proposing some enhanced capabilities to go with the LRM and friends from the Linux-HA project. I will update this entry on an ongoing basis to match my current thinking about this proposal. This post outlines a proposed server live...
It's worth noting that Pacemaker (a child project of Linux-HA - formerly called the Linux-HA CRM) does implement this convenient type of health monitoring - that applies to every resource on the machine.
Using virtualization to provide "HA at wholesale"
Traditionally, the way people have implemented high availability is by using a high-availability management package like Linux-HA[1], then configure it in detail for each application, file system mount, IP address and so on. This traditional method works quite well, but can be a bit labor inten...
Alan R. is now following The Typepad Team
Mar 15, 2010
I spent the first 20 years of my career working for Bell Labs on exactly those kind of highly redundant systems. They've been largely abandoned largely because they are too expensive, and to get the benefit from them they need special software. Ditto for the Tandem systems - abandoned as too expensive.
Everything fails. EVERYTHING. You just have to wait long enough. Eventually the sun will burn out. The only question is what you're going to do when it fails...
Quite frankly, I think all HA cluster software (as it's been traditionally understood) is doomed. Virtualization makes redundancy and failover simple, and eventually it will make it easy - probably mainly through cloud computing.
Availability, MTBF, MTTR and other bedtime tales
If we let A represent availability, then the simplest formula for availability is: A = Uptime/(Uptime + Downtime) Of course, it's more interesting when you start looking at the things that influence uptime and downtime. The most common measures that can be used in this way are MTBF and MTTR...
More...
Subscribe to Alan R.’s Recent Activity