This is Chip Salzenberg's Typepad Profile.
Join Typepad and start following Chip Salzenberg's activity
Join Now!
Already a member? Sign In
Chip Salzenberg
Interests: evolution, cool technology, haskell, sf, martial arts
Recent Activity
"It shouldn't be *a* queue, it should be *many* queues with non-trivial relationships between them." Fair enough. It's close enough to what I asked for, in my inchoate way. "I assume "now" above means at the time of a *second* failure, even though you don't say so, because it would be absurd to expect data survival if you had originally written with W=1." Use cases vary. In our case, we are underprovisioned enough by design that we do run W=1 (or its moral equivalent, single master) and watch our replication very carefully. Sometimes we lose data, but when it happens, we know which data and can replicate it from upstream in our overall process, simply by rewinding the upstream feed. In this environment, knowing that replication is behind by (say) no more than a minute means that after a crash and master/slave swap, we need only rewind the upstream to a minute before the crash and we're all good. Uncertainty and arbitrarily delayed replication in the manner of Cassandra [and, I believe, GlusterFS?] defeat this strategy, and would require us to switch to a highly overprovisioned strategy that supports full traffic at W=2. We would rather not. As for my memory of anti-entropy, I did know about it at one time; my memory had simply faded. The fact of its having to re-read and re-write the world made it operationally indistinguishable from a mass read-repair, which is my excuse such as it is. Anti-entropy's use of merkle trees accomplishing a reduction of network traffic but not a significant change in strategies and weaknesses. As for what I would consider -not- fatally flawed: I think I've been clear. I want to know for sure how far back in time to go before I reach an assurance that the data written then are safe, and I want that number to be kept as low as possible at all times, hopefully without manual intervention.
That people are using something doesn't mean that thing is fit for the use. cf Windows. I am curious how you know read repair is working. Seems to me that it could be working very badly and you might never know, unless you are so overprovisioned that write replication never fails you and nodes never die. Full repair is conceptually identical to mass read repair, in that it compares what the nodes have and make sure they end up sharing what any of them has. It doesn't require sending the full data, but that's not a relevant difference to me. It still requires READING the full data, and iops are the more precious resource.
To clarify what certainly looked like a backtrack, and maybe was: I don't need client-side crypto so requirements relating to it were not on my mind. Read repair as a vital part of replication is ludicrous in the absence of crypto, but if it's to make crypto better, and you've set yourself an arbitrary-byte-range write requirement, it's not ludicrous ... just inconvenient (modulo the notification issues you acknowledge). Also, I never changed my definition of blob; we started with different definitions and now have achieved mutual understanding (if not respect). If you say they're equivalent for the theory, I don't know enough to argue. OK, enough conversational housekeeping. "In the operational long term, there should be a guarantee that the system will restore itself to N replicas in finite time when there are no new requests." I can entirely agree with this. But "It's not feasible to expect the same when new requests continue to come in, or to expect that the current replication state will be perfectly knowable at one place and time" is I think a straw man. Knowing "replication is keeping up generally, with a queue length max of 5 sec" is plenty precise for operational use. Similarly "single global order" is a straw man; I meant local order, as in, if node A has things to send to node B, it can keep a queue of them so they get delivered, in order, eventually. Your adjective "non-deterministic" hits the nail on the head, I think. Queues provide determinism, but it's the determinism I'm missing, not the queues. It'd be neat if Cassandra's anti-entropy feature works entirely as you describe, but that doesn't mean my fire-and-forget description was inaccurate. Supposing a node entirely dies rather than simply being temporarily offline, a failed replication may have gone an arbitrary time without being noticed and retried, and now the node with the only good copy is gone. As for why I'm doing this; I'm offended by Cassandra's marketing. Cassandra's boosters don't tell the truth, even if perchance they don't lie. Cassandra *isn't* ready for production; I wasted a lot of time on it, and I'm trying to save other people the same pain. Similarly I was considering GlusterFS for a project, and found it wanting for different but related reasons. Perhaps I should rename my blog "Mene Mene Tekel Parsin." Remember it was a king who got that message, so don't feel put down. Granted he wasn't a king for much longer. PS: Calling Cassandra's performance issues with repair "warts" is very, very generous.
"No, it doesn't handle partition/split-brain situations" Well, that's something agreed. Good. "Your PS about encrypted data etc. is only correct if you're always writing whole files, which is a tyro's assumption." You'll note I said "blobs", not "files". Blobs can be sub-file elements. If the design of GlusterFS doesn't allow for encrypted sub-file blobs, that doesn't harm my argument. Perhaps such a limitation was accepted for good reason. In any case, the concept of read repair is not at fault here; demanding some client cooperation is acceptable, if not ideal. It's the implementation's requirement that the client Just Has To Know when to do it that is the flaw: lack of notice when guarantees are not being met. How is a client supposed to know when read repair pass is required? Perhaps more tellingly, when can a client know for certain *no* read repair passes are required because connectivity has been perfect? "At least ... vapor stage ..." Want to help? Oh, sorry, I slipped into Darcy mode there for a second. If you mean to bring in an ad hominem argument, you're not doing very well. You should know more about what you criticize (he wrote with deliberate sarcasm). Perl 5 is hardly vaporware. It's not even dead. :-)
PS - as long as the pure client-side encryption is -consistent- there is no need for read repair. Nodes can flood propagate the encrypted blobs just fine, without any need to understand them.
GlusterFS replication is, without doubt, a joke. Supposedly offering replication without ensuring that when you need a complete and correct replica you will have it is nothing more than a joke. Unless it's bait-and-switch: "Sure we have replication. See, we write the data more than once! [usually] No problem!" Pick your explanation: incompetence or deception. They claim 3.3 will be better. We'll see. They've hardly proven themselves trustworthy. WRT Cassandra: If you had read my article and understood it, you'd understand that Cassandra's read repair still offers no guarantee that it worked because replication events triggered by the read repair are just as vulnerable to loss as the originals were. And you'll never know until it's too late. Meanwhile, a Cassandra node repair is even *worse* than a typical fsck, because the *good* data have to be copied too. Have you experienced it yourself, on a machine at capacity (actually half capacity because Cassandra steals 50% of your disk space)? I have, which is why I know it's a joke.
I did say that fds fall under the Utility Corollary. If you could know that every fd was a plain file, would your fd-manipulating code be simpler? Of course. So that's the Pretense Rule at work. But for the sake of utility Unix (and even more Plan 9) extended fds to cover lots of non-file things. The loss of simplicity is acceptable.
Well, AB, that sounds excellent! I will watch for 3.3.
Jeremy: You think they don't know? I've spoken to many of the devs in person. My company hired Riptano. It's not like this is a secret, it's their freaking DESIGN PHILOSOPHY: "Lose all the data you want--the user can always make more."
If mockery were not a powerful weapon, a lot of political battles would have gone differently. The incredibly roundabout path to 6 as well as the 5/6 split open us to mockery, and are thus inherently problems, if only of image.
Toggle Commented Jun 27, 2011 on Perl 5? Perl 6? Perl X? at Modern Perl
"Flat, wobbly, chicklet-style keys with flimsy rubber dome contacts" ... thank you for giving our pain a name. That's exactly how it breaks down. So to speak.
As for perltidy, I have never enjoyed working with automated layout systems. Smart editors, sure. But (1) any code so messed up that you need an indenter to read it probably isn't worth reading. Seriously. and (2) I follow TomC's maxim that logically parallel constructs should be visually parallel as well, and automatic indenters ruin that.
Toggle Commented Apr 15, 2011 on Perl Development Essentials at The New NASA Calendar
I've elaborated on the virtues of ThinkPad keyboards in a new post: http://chip.typepad.com/weblog/2011/04/thinkpad-keyboards.html Thanks much for the pointers to Text::FindIndent, Emacs::PDE, Devel::PerlySense, Sepia, and flymake. I'll check them out.
Toggle Commented Apr 15, 2011 on Perl Development Essentials at The New NASA Calendar
In a related matter, do HT receivers typically have any mixing ability? I'd like to e.g. play music from device A while also playing a game with game console B.
1 reply