This is Abe Hassan's Typepad Profile.
Join Typepad and start following Abe Hassan's activity
Join Now!
Already a member? Sign In
Abe Hassan
Recent Activity
Often as system administrators, we're doing three things at once: administering an application or a service; providing support to the folks who use that service; and using that service itself. In systems where permissions can be assigned on a fine-grained level, those tend to get rolled up into a handful of roles. The most basic case is to have two roles, "user" and "administrator", where the latter has all the privileges of the former. When answering a user's question, I've frequently found myself wondering: is this something everyone can do, or is it because I have elevated privileges? Unless the... Continue reading
Posted Jan 21, 2014 at abe hassan | blog
I like using the built-in iOS Mail and Calendar apps for my Google accounts. I can't delete 'em, so I might as well use them. Otherwise my OCD will take over, and in general they seem to work just fine for me and my weird workflows... .. Except for one thing: They only show my primary calendar. I have access to a handful of shared calendars and there's no easy way to display them. Turns out that there's an option to display more calendars. Go to the Google Mobile Sync page from your phone, find your device, and check off... Continue reading
Posted Oct 8, 2013 at abe hassan | blog
Abe Hassan added a favorite at Everything Typepad
Aug 19, 2013
WARNING: window sizing for tcp sources were changed in syslog-ng 3.3, the configuration value was divided by the value of max-connections(). The result was too small, clamping to 100 entries. Ensure you have a proper log_fifo_size setting to avoid message loss.; orig_log_iw_size='25', new_log_iw_size='100', min_log_fifo_size='100000'. Most of what I've found on the Internets (including this fantastic syslog-ng tuning guide from Etsy) will tell you to adhere to this formula: max_connections * log_fetch_limit <= log_iw_size They also tell you that log_fetch_limit defaults to 10. So you do the math a little bit and you wonder why setting log_iw_size = 10000 with max_connections... Continue reading
Posted Jun 27, 2013 at abe hassan | blog
Hey, we did look at Riemann and we found it wasn't quite mature enough for what we wanted. In particular I think its failing comes down to two things: that it only accepts incoming data (as opposed to being able to poll for it), and that it doesn't retain the data for long enough. If everything could be Riemann-compatible, it'd work; but as it stands I have legacy applications, I have network devices, I have a whole number of things that need to be polled for data. Additionally it means that it doesn't quite have the concept of a failing check -- it just knows incoming data. We also ran into issues around it not being a long-term data store -- when we wanted to alert on historical threshold (rather than "last 10 seconds" or "last 10 data points") we'd need to keep everything in RAM, etc. It might be a good router and decisioning layer though, just not if I have to have my entire data set in RAM to be able to make intelligent historical decisions.
1 reply
All this talk over the last few years about devops skirts around one particularly cynical take on traditional system administration: that it slows product development down. That error-proofing everything is unnecessary, that it's purely overhead and catastrophizing, at the expense of shipping new features. For many, certain failure modes aren't worth preventing, and there's a whole lot of these tradeoffs happening in the deployment space in order to ship new features, faster. I think the core of this comes down to package management. This is, by far, the biggest point of friction I've experienced. Certainly there are other places where... Continue reading
Posted Jun 19, 2013 at abe hassan | blog
A few months back, we were getting the graphite and statsd stack set up in our world. The frontend functionality offered by Graphite and the easy-to-use interface offered by Statsd are incredibly powerful. But our use of statsd quickly presented a problem to us: we wanted to understand cluster health, but we also wanted to understand per-host health in a way where we could do blue/green deployment. Statsd doesn't easily give you that ability, at least not out of the box: If you run a statsd server on each machine, then your metrics stay separated even up until you get... Continue reading
Posted Jun 18, 2013 at abe hassan | blog
About a year ago, I was trying to figure out how the right config mgmt recipes make a CentOS 4-based machine1 properly auth against our internal LDAP servers. I was having a pretty hard time of it -- we had nailed it for Debian and for CentOS 5, but CentOS 4 changed some additional files when I ran `authconfig`. I couldn't figure it out, and sort of paradoxically, I therefore only had one opportunity, per new machine, to figure it out. Otherwise I had to throw away the machine and re-instantiate, and even though it only took a couple minutes,... Continue reading
Posted Jun 17, 2013 at abe hassan | blog
Abe Hassan added a favorite at Seth's Blog
Jun 13, 2013
Nick, you're totally right. I had removed it from Notification Center, I guess that didn't disable the badges/banners by default. Forgot that Notification Center isn't a superset of Notifications. My bad. :D
1 reply
Google's notion of bringing its Voice, Chat/Talk, Hangouts, and related products all under the umbrella of "Google Hangouts" is awesome. The way in which those tools were integrated into the rest of the Google ecosystem was pretty haphazard (why can I place a call from Gmail but not from Google Voice?). Seeing a concentrated effort to use the same messaging system is great. Except. I've switched to the new Hangouts on my personal account and I've found a few things that bug me: There's no way to globally disable logging within Google. I can turn it off per-chat, but I... Continue reading
Posted May 28, 2013 at abe hassan | blog
Abe Hassan added a favorite at Everything Typepad
May 14, 2013
Newspapers and magazines are dying, but blogs and news sites and Twitter and Facebook aren't enough to fill the space they're leaving. At Say Media, we're building the next generation of digital media. I thought I'd take a diversion from my usual tech blogging to share my perspective on what we're doing, and why I'm so excited for it. We know that traditional media -- newspapers and magazines and television and radio -- aren't keeping up with the wants and needs of an increasingly online and off-the-grid society. We know that those companies are seeing this trend, and they're trying... Continue reading
Posted Apr 24, 2013 at abe hassan | blog
Bank websites are down for something like 4-8 hours every Saturday evening for scheduled maintenance. Even non-technical people are familiar with Twitter's "fail whale". Tumblr has one of the worst uptime records of all blogging services (at least in 2011, though I'm not sure much has changed in 2012). The Apple Store falls over every time a new iPhone goes on sale (and it's down for "scheduled maintenance" on announcement days). And yet that doesn't stop us from using those services. I think a company's interest in uptime breaks down into two elements: monetary and reputational. Monetary implications are complex... Continue reading
Posted Apr 8, 2013 at abe hassan | blog
Image
About a month ago, we had a mini-hackathon at work wherein we ported alllll of our systems from talking to Ganglia to instead talking to Graphite. A coworker was amused and snapped a photo: Nerdfest! Continue reading
Posted Feb 28, 2013 at abe hassan | blog
According to a lot of universities, software development is still a mix of computer science and electrical engineering. It was only in May 2012 that NCEES announced that they would be offering a Principles of Engineering exam in Software, starting this coming April. The foundational work of Software Engineering has been happening for decades. The invention of calculators and computers and all the innovation on the hardware side of things; pieces of software that have become critical to the Internet (think about apache or memcache or even Linux); and revolutionary design patterns (think about map-reduce or the c10k problem). Over... Continue reading
Posted Feb 25, 2013 at abe hassan | blog
Here are two problems that we ran into at LiveJournal: In a paginated system, how do you know whether to display the next/previous links? With password prompts being masked out, how can you tell a user that they kept typing but the prompt didn't record more? In the first case: if you display 20 items per page, load 21. If you have 21, throw out the last one and add the pagination links. If you get 20, then you know it's exactly right. Doing a database query with a "LIMIT 20" doesn't give you enough information to do this. In... Continue reading
Posted Feb 8, 2013 at abe hassan | blog
Yes, you're totally right. You need more information than the Five Number Summary. You can turn two averages into an average if you have the number of original data points; and you can turn two stddevs into one if you have the averages and number of data points. But percentiles are a whole 'nother story. My suspicion is that Graphite's concept of percentiles is related to the data points it has stored. So it's not the 90th percentile *at that point*, but rather the 90th percentile of the data in the metric. To get 90th percentile at a given point in time, I would use statsd, which can calculate that and emit it to Graphite. So there's a percentile at a point in time, and then a percentile across all time (or across the last X data points). I suspect Graphite is doing the latter. Technically valid, but super duper confusing.
1 reply
Abe Hassan added a favorite at Please. Fix. That.
Jan 31, 2013
Back when I worked at LiveJournal, Brad Fitzpatrick commented: Our standards are continually being raised, so by definition any old code is ugly because new best practices have emerged in the meantime. This has always stuck with me in the back of my mind, but it came up again in a conversation earlier this week. In trying to find the right way to manage a backlog of technical debt -- a whole topic in itself -- I pointed out that the discussion conflated two issues: one was technical debt, and I called the other "technical depreciation". Your code may have... Continue reading
Posted Jan 30, 2013 at abe hassan | blog
I haven't, yet. It seems like it would break existing use to make this change, even if it is to fix it. (Maybe people have come to expect derivative to be a delta between data points.) But maybe I should let them make that decision?
1 reply
We had a day-long war room yesterday wherein we ported a lot of our scripts and systems from writing to Ganglia to instead write to Graphite. We got to simplify a lot of stuff, primarily around rate-of-change calculations. Instead of doing those ourselves, we were sending in total values, and letting Graphite handle the "per second" part of things. So it was a little odd when Hachi called me over and pointed out something funny. He was plotting the function "derivative(metric.memcache_gets)", where memcache_gets is the total lifetime gets that it has served. Over the last hour, the graph hovers around... Continue reading
Posted Jan 25, 2013 at abe hassan | blog
Hmm. I think the reason I exclude Facebook is that the newsfeed is status updates, activity updates, etc. Less about being a stream of your posts. Notes doesn't seem to be an often-used feature, and the status update box doesn't encourage long-form writing. So it's more geared towards "what's going on" rather than "what's on your mind". I think it has all the right features, but the emphasis is different.
1 reply
I spent many, many years caring about and caring for LiveJournal. One of our biggest challenges was figuring out how the site should evolve. Its age almost guaranteed that we had some sizable portion of the userbase using any given feature, which made it harder to retire features; our extraordinary ability to botch major announcements made us even more scared to do it. And the new stuff we were shipping didn't generally have a cohesive vision behind it. If it did, we'd be able to establish momentum behind that vision, and a better way of talking about it. Anyway. That... Continue reading
Posted Jan 22, 2013 at abe hassan | blog
statsd to carbon, flushInterval 10s, data is fine. carbon writes the value then overwrites with zeros. what the what?— Abe Hassan (@burr86) January 10, 2013 I'm so mad at myself right now. We're getting statsd and graphite deployed, and we've been feeding some counters into statsd. We've generally found this to be fairly straightforward, except that one of our counters has been a bit schizophrenic. Sometimes it reports correctly, and sometimes it reports zero (rather than null). In fact, watching the whisper files directly, I sometimes see the timestamp and the value written to the file, and then immediately overwritten... Continue reading
Posted Jan 10, 2013 at abe hassan | blog