This is's Typepad Profile.
Join Typepad and start following's activity
Join Now!
Already a member? Sign In
Recent Activity
Everyone thus far has been so nice about reflecting upon Asay's article. I applaud you all. My own character flaw becomes exposed when I tend to lose my poker face as I have an extraordinary distaste for baseless, FUD'ish blogging. The issue I take with his article isn't so much the nature of his argument than how he tries to forward it. Does it not occur to him that, someone who is allegedly a 'data science' mogul and makes a positional statement, yet does not provide supporting evidence via his own craft might find a bit of a credibility problem? Of course, what he misses is the real context in which his argument resides. From a purely fundamental statistical and more generally scientific viewpoint, one cannot compare the outcomes of an apple and orange simply by their visual attributes (such is one particular grudge I have in the infographics world today - another kvetch for another time). Naturally he would have had to look at the *use case* as a stratum to add at least some substance to his thesis. That is, he needed to compare the intersection where pythonistas and r users (and even dual users) converge. Programmatically (not syntactically), R and Python have several points of congruence. They are both multi-paradigm: array (of the 2, R is more suited to vector programming), object-oriented, imperative, functional, procedural, reflective (thank you Wikipedia for that nice summary so I didn't have to recall that from dusty texts). Technically, they can be used for similar things. Nothing new there. By contrast, R does not have as strict a typing discipline as Python, which can be both a strength and a weakness depending again on your use case. The syntax and best coding practices are indeed different between the R and Py. For those coming from purely OOP studies and experience with more base languages (C++, etc), yes there will be plenty of gripes. Gee, we've never seen THAT before - yet these languages/environments persist and have their places, just as R and Python do (imagine those same folks being forced to learn SAS - I imagine the suicide rate in the world will have consequentially increased 4 fold :) ). However, if the objective includes time-to-model development, and a more primary focus on method, then R is far more mature in this regard. Note I didn't say better, worse, etc (my mention of SAS unequivocally being an exception :) ). I personally use BOTH R and Py in my work, depending on the use case. I use other programming environments as well for the same reason - I don't believe that a single technology stack is a determinant of 'better or best' in DS. Sure, it would be interesting to see an R/Py or some other hybrid to test R's mettle, where code discipline is a bit more unified and translatable, with the addition of every scientific package imaginable, with vector based programming, and better scalability. Change can be good, and is important. But my guess is, you'd just have a mash-up where each of R and Py, or whatever else would retain their own characteristics. Hmmm I wonder about that RPy package thingy they have out there :). Even better, ever use RevoDeployR? No, I don't see either language being supplanted by the other - or a 'better' or 'worse' overall language in general. That's very myopic thinking. I believe Ruby was one such attempt at this experiment - and it certainly has its following, but it most certainly didn't diminish the importance of any of its component language contributors. ---------------Let me diverge here --------------- So I'm kvetching about one comparatively small issue in the universe... blame my genes on that one :). But I do believe that Asay's blog is a small contributing part to a much larger problem in the 'data science' arena. It's as if there's this rather muted 'mortal combat' between those who are good at dangling shining lights, and those who are genuinely, measurably, and meaningfully impacting their objects, and the conglomerative discipline itself. His blog resembles very much in my mind the article "The Death of the Statistician" . You can google the title itself and find other references which seem to indistinctly draw these boundaries between 2 different disciplines trying to achieve similar final objectives. Where does this myopia come from? That's the easy one: economic/power advantage. This is nothing new of course, either conceptually or historically. However when this behavior is extended into the scientific research world itself, many new problems emerge - mostly in the realm of general credibility and value (allow for example clinical science, big pharma, and medical device histories:) ) - not a good thing. The data science world and all related parties should be *very* concerned about this (even as small as the aforementioned blog) and should consider appropriate actions before the broad sweeping black eyes begin, affecting the credibility of the whole. We have much to do in this 'storming and norming' in each of our areas surrounding the rather infant face of data science to deal with this. Any science must maintain not only its creativity, but also its rigor, within its methods and within its ranks, if it is to be a science at all. Many thanks to David for publishing this in a far better manner than I just did :) is now following The Typepad Team
Dec 9, 2013