This is Chip Salzenberg's Typepad Profile.
Join Typepad and start following Chip Salzenberg's activity
Chip Salzenberg
Interests: evolution, cool technology, haskell, sf, martial arts
Recent Activity
"It shouldn't be *a* queue, it should be *many* queues with non-trivial relationships between them."
Fair enough. It's close enough to what I asked for, in my inchoate way.
"I assume "now" above means at the time of a *second* failure, even though you don't say so, because it would be absurd to expect data survival if you had originally written with W=1."
Use cases vary. In our case, we are underprovisioned enough by design that we do run W=1 (or its moral equivalent, single master) and watch our replication very carefully. Sometimes we lose data, but when it happens, we know which data and can replicate it from upstream in our overall process, simply by rewinding the upstream feed.
In this environment, knowing that replication is behind by (say) no more than a minute means that after a crash and master/slave swap, we need only rewind the upstream to a minute before the crash and we're all good. Uncertainty and arbitrarily delayed replication in the manner of Cassandra [and, I believe, GlusterFS?] defeat this strategy, and would require us to switch to a highly overprovisioned strategy that supports full traffic at W=2. We would rather not.
As for my memory of anti-entropy, I did know about it at one time; my memory had simply faded. The fact of its having to re-read and re-write the world made it operationally indistinguishable from a mass read-repair, which is my excuse such as it is. Anti-entropy's use of merkle trees accomplishing a reduction of network traffic but not a significant change in strategies and weaknesses.
As for what I would consider -not- fatally flawed: I think I've been clear. I want to know for sure how far back in time to go before I reach an assurance that the data written then are safe, and I want that number to be kept as low as possible at all times, hopefully without manual intervention.
Why GlusterFS is Glusterfsck'd Too
Looking into GlusterFS, I find that its designers (like those of Cassandra) have failed to take replication seriously. They depend on read repair to trigger checks for file consistency... and, unbelievably, they don't even trigger complete repair automatically after a node has been disconnected...
That people are using something doesn't mean that thing is fit for the use. cf Windows.
I am curious how you know read repair is working. Seems to me that it could be working very badly and you might never know, unless you are so overprovisioned that write replication never fails you and nodes never die.
Full repair is conceptually identical to mass read repair, in that it compares what the nodes have and make sure they end up sharing what any of them has. It doesn't require sending the full data, but that's not a relevant difference to me. It still requires READING the full data, and iops are the more precious resource.
Why Cassandra is Unfit for Production
Cassandra is a tragedy. Database replication should be a queue, or otherwise kept strict track of. If a datum should be replicated from X to Y and Z, then if it hasn't gotten to Z yet, then it should eventually. The database is allowed to fail to replicate at first, but it is not allowed to ju...
To clarify what certainly looked like a backtrack, and maybe was: I don't need client-side crypto so requirements relating to it were not on my mind. Read repair as a vital part of replication is ludicrous in the absence of crypto, but if it's to make crypto better, and you've set yourself an arbitrary-byte-range write requirement, it's not ludicrous ... just inconvenient (modulo the notification issues you acknowledge).
Also, I never changed my definition of blob; we started with different definitions and now have achieved mutual understanding (if not respect). If you say they're equivalent for the theory, I don't know enough to argue.
OK, enough conversational housekeeping.
"In the operational long term, there should be a guarantee that the system will restore itself to N replicas in finite time when there are no new requests." I can entirely agree with this. But "It's not feasible to expect the same when new requests continue to come in, or to expect that the current replication state will be perfectly knowable at one place and time" is I think a straw man. Knowing "replication is keeping up generally, with a queue length max of 5 sec" is plenty precise for operational use. Similarly "single global order" is a straw man; I meant local order, as in, if node A has things to send to node B, it can keep a queue of them so they get delivered, in order, eventually.
Your adjective "non-deterministic" hits the nail on the head, I think. Queues provide determinism, but it's the determinism I'm missing, not the queues.
It'd be neat if Cassandra's anti-entropy feature works entirely as you describe, but that doesn't mean my fire-and-forget description was inaccurate. Supposing a node entirely dies rather than simply being temporarily offline, a failed replication may have gone an arbitrary time without being noticed and retried, and now the node with the only good copy is gone.
As for why I'm doing this; I'm offended by Cassandra's marketing. Cassandra's boosters don't tell the truth, even if perchance they don't lie. Cassandra *isn't* ready for production; I wasted a lot of time on it, and I'm trying to save other people the same pain. Similarly I was considering GlusterFS for a project, and found it wanting for different but related reasons. Perhaps I should rename my blog "Mene Mene Tekel Parsin." Remember it was a king who got that message, so don't feel put down. Granted he wasn't a king for much longer.
PS: Calling Cassandra's performance issues with repair "warts" is very, very generous.
Why GlusterFS is Glusterfsck'd Too
Looking into GlusterFS, I find that its designers (like those of Cassandra) have failed to take replication seriously. They depend on read repair to trigger checks for file consistency... and, unbelievably, they don't even trigger complete repair automatically after a node has been disconnected...
"No, it doesn't handle partition/split-brain situations"
Well, that's something agreed. Good.
"Your PS about encrypted data etc. is only correct if you're always writing whole files, which is a tyro's assumption."
You'll note I said "blobs", not "files". Blobs can be sub-file elements. If the design of GlusterFS doesn't allow for encrypted sub-file blobs, that doesn't harm my argument. Perhaps such a limitation was accepted for good reason.
In any case, the concept of read repair is not at fault here; demanding some client cooperation is acceptable, if not ideal. It's the implementation's requirement that the client Just Has To Know when to do it that is the flaw: lack of notice when guarantees are not being met. How is a client supposed to know when read repair pass is required? Perhaps more tellingly, when can a client know for certain *no* read repair passes are required because connectivity has been perfect?
"At least ... vapor stage ..."
Want to help? Oh, sorry, I slipped into Darcy mode there for a second. If you mean to bring in an ad hominem argument, you're not doing very well. You should know more about what you criticize (he wrote with deliberate sarcasm). Perl 5 is hardly vaporware. It's not even dead. :-)
Why GlusterFS is Glusterfsck'd Too
Looking into GlusterFS, I find that its designers (like those of Cassandra) have failed to take replication seriously. They depend on read repair to trigger checks for file consistency... and, unbelievably, they don't even trigger complete repair automatically after a node has been disconnected...
PS - as long as the pure client-side encryption is -consistent- there is no need for read repair. Nodes can flood propagate the encrypted blobs just fine, without any need to understand them.
Why GlusterFS is Glusterfsck'd Too
Looking into GlusterFS, I find that its designers (like those of Cassandra) have failed to take replication seriously. They depend on read repair to trigger checks for file consistency... and, unbelievably, they don't even trigger complete repair automatically after a node has been disconnected...
GlusterFS replication is, without doubt, a joke. Supposedly offering replication without ensuring that when you need a complete and correct replica you will have it is nothing more than a joke. Unless it's bait-and-switch: "Sure we have replication. See, we write the data more than once! [usually] No problem!" Pick your explanation: incompetence or deception.
They claim 3.3 will be better. We'll see. They've hardly proven themselves trustworthy.
WRT Cassandra: If you had read my article and understood it, you'd understand that Cassandra's read repair still offers no guarantee that it worked because replication events triggered by the read repair are just as vulnerable to loss as the originals were. And you'll never know until it's too late.
Meanwhile, a Cassandra node repair is even *worse* than a typical fsck, because the *good* data have to be copied too. Have you experienced it yourself, on a machine at capacity (actually half capacity because Cassandra steals 50% of your disk space)? I have, which is why I know it's a joke.
Why GlusterFS is Glusterfsck'd Too
Looking into GlusterFS, I find that its designers (like those of Cassandra) have failed to take replication seriously. They depend on read repair to trigger checks for file consistency... and, unbelievably, they don't even trigger complete repair automatically after a node has been disconnected...
I did say that fds fall under the Utility Corollary. If you could know that every fd was a plain file, would your fd-manipulating code be simpler? Of course. So that's the Pretense Rule at work. But for the sake of utility Unix (and even more Plan 9) extended fds to cover lots of non-file things. The loss of simplicity is acceptable.
Salzenberg's Law of Pretense
Salzenberg's Law of Pretense: Trying to simplify technology by pretending a thing is something else always fails, because the pretense is itself a complication. Putting a mask on a thing does not remove the thing, it adds the mask. It thus requires new technologies to create the mask, identify...
Well, AB, that sounds excellent! I will watch for 3.3.
Why GlusterFS is Glusterfsck'd Too
Looking into GlusterFS, I find that its designers (like those of Cassandra) have failed to take replication seriously. They depend on read repair to trigger checks for file consistency... and, unbelievably, they don't even trigger complete repair automatically after a node has been disconnected...
Jeremy: You think they don't know? I've spoken to many of the devs in person. My company hired Riptano. It's not like this is a secret, it's their freaking DESIGN PHILOSOPHY: "Lose all the data you want--the user can always make more."
Why Cassandra is Unfit for Production
Cassandra is a tragedy. Database replication should be a queue, or otherwise kept strict track of. If a datum should be replicated from X to Y and Z, then if it hasn't gotten to Z yet, then it should eventually. The database is allowed to fail to replicate at first, but it is not allowed to ju...
If mockery were not a powerful weapon, a lot of political battles would have gone differently.
The incredibly roundabout path to 6 as well as the 5/6 split open us to mockery, and are thus inherently problems, if only of image.
Perl 5? Perl 6? Perl X?
There's been quite a few postings lately about Perl branding and the relationship between the Perl 5 and Perl 6 projects. From what I can gather it seems there's frustration on the part of some people that the existence of the Perl 6 is somehow hurting Perl 5. The argument seems to go that pe...
"Flat, wobbly, chicklet-style keys with flimsy rubber dome contacts" ... thank you for giving our pain a name. That's exactly how it breaks down. So to speak.
Why I Only Use ThinkPads: Keyboards
By request, here's some more detail about why ThinkPads are uniquely qualified as tools for typing, which is what coders mostly do. I started ThinkPads with a 765, went through a couple of A30s and T2x and T4x models, today I use a T61p, and I am drooling over the T510, so I am walking this walk...
As for perltidy, I have never enjoyed working with automated layout systems. Smart editors, sure. But (1) any code so messed up that you need an indenter to read it probably isn't worth reading. Seriously. and (2) I follow TomC's maxim that logically parallel constructs should be visually parallel as well, and automatic indenters ruin that.
Perl Development Essentials
A recent question on the SF.pm mailing list about IDEs prompted me to think about what I depend on to code Perl. I've been spending some time recently in GUI IDEs for other languages, especially MonoDevelop and Visual Studio. MonoDevelop is snappy and helpful. VS is of course a bloated mess. ...
I've elaborated on the virtues of ThinkPad keyboards in a new post: http://chip.typepad.com/weblog/2011/04/thinkpad-keyboards.html
Thanks much for the pointers to Text::FindIndent, Emacs::PDE, Devel::PerlySense, Sepia, and flymake. I'll check them out.
Perl Development Essentials
A recent question on the SF.pm mailing list about IDEs prompted me to think about what I depend on to code Perl. I've been spending some time recently in GUI IDEs for other languages, especially MonoDevelop and Visual Studio. MonoDevelop is snappy and helpful. VS is of course a bloated mess. ...
In a related matter, do HT receivers typically have any mixing ability? I'd like to e.g. play music from device A while also playing a game with game console B.
Home audio stuff (Does HDMI/3D surround matter?)
lazyweb high-def audio geeks, I just sold my current YAMAHA receiver RX-V659BL from 2006 today, because its depth is way beyond the new TV rack's depth I bought at IKEA (how lame reason is that!), and am now thinking of buying a new receiver to replace that. My only requirement for the receiver ...
Subscribe to Chip Salzenbergās Recent Activity