This is Abe Hassan's TypePad Profile.
Join TypePad and start following Abe Hassan's activity
Join Now!
Already a member? Sign In
Abe Hassan
Recent Activity
Abe Hassan added a favorite at Everything Typepad
May 14, 2013
Newspapers and magazines are dying, but blogs and news sites and Twitter and Facebook aren't enough to fill the space they're leaving. At Say Media, we're building the next generation of digital media. I thought I'd take a diversion from my usual tech blogging to share my perspective on what we're doing, and why I'm so excited for it. We know that traditional media -- newspapers and magazines and television and radio -- aren't keeping up with the wants and needs of an increasingly online and off-the-grid society. We know that those companies are seeing this trend, and they're trying... Continue reading
Posted Apr 24, 2013 at abe hassan | blog
Bank websites are down for something like 4-8 hours every Saturday evening for scheduled maintenance. Even non-technical people are familiar with Twitter's "fail whale". Tumblr has one of the worst uptime records of all blogging services (at least in 2011, though I'm not sure much has changed in 2012). The Apple Store falls over every time a new iPhone goes on sale (and it's down for "scheduled maintenance" on announcement days). And yet that doesn't stop us from using those services. I think a company's interest in uptime breaks down into two elements: monetary and reputational. Monetary implications are complex... Continue reading
Posted Apr 8, 2013 at abe hassan | blog
Image
About a month ago, we had a mini-hackathon at work wherein we ported alllll of our systems from talking to Ganglia to instead talking to Graphite. A coworker was amused and snapped a photo: Nerdfest! Continue reading
Posted Feb 28, 2013 at abe hassan | blog
According to a lot of universities, software development is still a mix of computer science and electrical engineering. It was only in May 2012 that NCEES announced that they would be offering a Principles of Engineering exam in Software, starting this coming April. The foundational work of Software Engineering has been happening for decades. The invention of calculators and computers and all the innovation on the hardware side of things; pieces of software that have become critical to the Internet (think about apache or memcache or even Linux); and revolutionary design patterns (think about map-reduce or the c10k problem). Over... Continue reading
Posted Feb 25, 2013 at abe hassan | blog
Here are two problems that we ran into at LiveJournal: In a paginated system, how do you know whether to display the next/previous links? With password prompts being masked out, how can you tell a user that they kept typing but the prompt didn't record more? In the first case: if you display 20 items per page, load 21. If you have 21, throw out the last one and add the pagination links. If you get 20, then you know it's exactly right. Doing a database query with a "LIMIT 20" doesn't give you enough information to do this. In... Continue reading
Posted Feb 8, 2013 at abe hassan | blog
Yes, you're totally right. You need more information than the Five Number Summary. You can turn two averages into an average if you have the number of original data points; and you can turn two stddevs into one if you have the averages and number of data points. But percentiles are a whole 'nother story. My suspicion is that Graphite's concept of percentiles is related to the data points it has stored. So it's not the 90th percentile *at that point*, but rather the 90th percentile of the data in the metric. To get 90th percentile at a given point in time, I would use statsd, which can calculate that and emit it to Graphite. So there's a percentile at a point in time, and then a percentile across all time (or across the last X data points). I suspect Graphite is doing the latter. Technically valid, but super duper confusing.
1 reply
Abe Hassan added a favorite at Please. Fix. That.
Jan 31, 2013
Back when I worked at LiveJournal, Brad Fitzpatrick commented: Our standards are continually being raised, so by definition any old code is ugly because new best practices have emerged in the meantime. This has always stuck with me in the back of my mind, but it came up again in a conversation earlier this week. In trying to find the right way to manage a backlog of technical debt -- a whole topic in itself -- I pointed out that the discussion conflated two issues: one was technical debt, and I called the other "technical depreciation". Your code may have... Continue reading
Posted Jan 30, 2013 at abe hassan | blog
I haven't, yet. It seems like it would break existing use to make this change, even if it is to fix it. (Maybe people have come to expect derivative to be a delta between data points.) But maybe I should let them make that decision?
1 reply
We had a day-long war room yesterday wherein we ported a lot of our scripts and systems from writing to Ganglia to instead write to Graphite. We got to simplify a lot of stuff, primarily around rate-of-change calculations. Instead of doing those ourselves, we were sending in total values, and letting Graphite handle the "per second" part of things. So it was a little odd when Hachi called me over and pointed out something funny. He was plotting the function "derivative(metric.memcache_gets)", where memcache_gets is the total lifetime gets that it has served. Over the last hour, the graph hovers around... Continue reading
Posted Jan 25, 2013 at abe hassan | blog
Hmm. I think the reason I exclude Facebook is that the newsfeed is status updates, activity updates, etc. Less about being a stream of your posts. Notes doesn't seem to be an often-used feature, and the status update box doesn't encourage long-form writing. So it's more geared towards "what's going on" rather than "what's on your mind". I think it has all the right features, but the emphasis is different.
1 reply
I spent many, many years caring about and caring for LiveJournal. One of our biggest challenges was figuring out how the site should evolve. Its age almost guaranteed that we had some sizable portion of the userbase using any given feature, which made it harder to retire features; our extraordinary ability to botch major announcements made us even more scared to do it. And the new stuff we were shipping didn't generally have a cohesive vision behind it. If it did, we'd be able to establish momentum behind that vision, and a better way of talking about it. Anyway. That... Continue reading
Posted Jan 22, 2013 at abe hassan | blog
statsd to carbon, flushInterval 10s, data is fine. carbon writes the value then overwrites with zeros. what the what?— Abe Hassan (@burr86) January 10, 2013 I'm so mad at myself right now. We're getting statsd and graphite deployed, and we've been feeding some counters into statsd. We've generally found this to be fairly straightforward, except that one of our counters has been a bit schizophrenic. Sometimes it reports correctly, and sometimes it reports zero (rather than null). In fact, watching the whisper files directly, I sometimes see the timestamp and the value written to the file, and then immediately overwritten... Continue reading
Posted Jan 10, 2013 at abe hassan | blog
Abe Hassan added a favorite at Masa aka Sekimura
Jan 9, 2013
For Christmas two years ago, I bought a friend a flying lesson. In advance of the actual class, we were asked to read a couple chapters out of a flying manual. One section described how to hand off the controls to your co-pilot, called a three-way positive control exchange technique. It's explicit on both sides: the person handing off says "your controls", the person receiving says "my controls", and the person handing off says "your controls" again. Within an hour of reading that, I had shared it with my team and asked everyone to adopt that practice for managing outages.... Continue reading
Posted Jan 7, 2013 at abe hassan | blog
Sylvain, that's something that I still need to explore more. The first idea that comes to mind is Nagios event handlers, where you can run scripts after certain soft/hard failures. I don't like that though, I wish there was a better way of triggering this, maybe from tools like Riemann or Jenkins?
1 reply
Lately at work we've had a number of discussions about improving our ability to deploy solid code. One camp had suggested post-release monitoring, while the other had suggested a thorough QA process before deployment. Separately, I'd also had a number of conversations while investigating application failures where it became clear to me that others who were involved didn't understand that a change in behavior could occur even if a code release hadn't happened. In short, you have monitors that tell you whether something is wrong, and you have monitors that tell you what is wrong. A Selenium test that exercises... Continue reading
Posted Jan 2, 2013 at abe hassan | blog
The major assertion here is that application development teams are responsible for making sure their application is working properly, on all levels. Business data needs to be digested and managed in other ways (often not even time-series based), but application data often is chronological. But I think we are moving away from the idea that servers belong to one team, code belongs to another, and the business on top of it belongs to a third. The health of an application overall is owned by the development team. Who gets to resolve a problem might vary, sure, but every team needs to have a view into what's going on with their own application. (Some might even argue that they should be the front-line responders, carrying pagers, etc.) With this unified view, important stuff bubbles up to the top, and ideally allows you to understand what's broken and why -- whether or not it's in your control to fix. In that world, you actually spend a lot more time focusing on building the right alerts and making sure they go to the right people/places, because your system is smart enough to know who needs to know what, when, and to ask them to look at a simple view of the world.
1 reply
I think we all agree: #monitoringsucks For me, the holy grail is the single pane of glass (pardon the douchey term), a unified view into your environment that shows you everything you need to know. Latest trends, latest service failures, latest deploys, latest everything. Real-time. So many types of devices! So many types of data! So many ways to visualize and analyze it! Instead we end up with a world where you have a bunch of tools doing part of a number of duties. Nagios runs checks, but so does Jenkins, and so does Riemann. Collectd collects data, but so... Continue reading
Posted Dec 27, 2012 at abe hassan | blog
Over the last couple of months, I've spent a great deal of time thinking about what "devops" means for me and my team at Say Media. The industry overall has been describing devops as a world where developers know more about the operational environment; and where operations folks know more about programming and application development. I think that's a good start, but it overlooks some of the nuances involved in implementing that culture. For the field of software engineering to grow and mature, it needs to bifurcate into specialized disciplines. The most obvious division is "operations" and "development", and then... Continue reading
Posted Dec 26, 2012 at abe hassan | blog
I think of a high-performance team as one where: individuals are working towards a shared vision; individuals are respectful and professional to each other; individuals are eager to learn and improve; and individuals have the hard technical skills your team needs. When your team consists of people who just plain don't share the same goals as each other, it introduces a level of friction that you can't really get past. You spend too much time either trying to convince or work around those people. Open communication about shared vision and direction is critical. Working towards diametrically opposite goals inhibits building... Continue reading
Posted Dec 11, 2012 at abe hassan | blog
When chicken hatchlings are born, large commercial hatcheries usually set about dividing them into males and females, and the practice of distinguishing gender is known as chick sexing. Sexing is necessary because the two genders receive different feeding programs: one for the females, which will eventually produce eggs, and another for the males, which are typically destined to be disposed of because of their uselessness in the commerce of producing eggs; only a few males are kept and fattened for meat. So the job of the chick sexer is to pick up each hatchling and quickly determine its sex in... Continue reading
Posted Nov 23, 2012 at abe hassan | blog
Okay, I'm really trying not to be dumb here, but I need to write this out and I want eyes that are smarter than mine helping me out. Comments and advice and feedback super welcome. I'm trying to figure out a way to do the following: Track base system level metrics like CPU usage, memory usage, Watts used, whatever Track application level metrics like requests per second, error rates, login rate, whatever Track events like releases or network configuration changes or whatever Create graphs of all that information with fine-grained granularity going back 3 months Create graphs of average, 50th... Continue reading
Posted Oct 31, 2012 at abe hassan | blog
There's a classification of servers that I like: Phoenix Servers and Snowflake Servers. Phoenix servers can be rebuilt on demand, while Snowflake servers are uniquely hand-crafted. The ideal is to have a fleet of Phoenix servers, and there are two approaches to building those: having a gold image that you use as your baseline, or having a configuration system that ensures your systems match your spec. In general I believe the latter is more effective to making sure that your environment is running exactly the way you want it. Baked servers are bootstrapped fully pre-configured. Often this means using a... Continue reading
Posted Oct 29, 2012 at abe hassan | blog