blog.rupamsunyata.org

Decklin’s excuse for some blogging software. Est. 2006.

Quote

Mutable state is actually another form of manual memory management: every time you over-write a value you are making a decision that the old value is now garbage, regardless of what other part of the program might have been using it.

Paul Johnson

Artificial Intelligence

After a recent apt-get:

Fetched 1B in 0s (42B/s)

Well. That settles that question! :-)

Inside Out

I've been on Twitter for over a year and a half now. If I had to explain why I like it, and why it's so popular now, I would use an analogy:

Twitter : IRC :: Blogs : Usenet

(This applies equally to other "micro-blogging" services, but I am about to explain why I believe that's not the right metaphor. You may also substitute mailing lists for Usenet.)

With the older media, you have a place -- a newsgroup, or a channel, that people went to, with a distinct culture, and that (mostly) weren't "owned" by anyone, but rather by the community. With the new ones, we are all sole proprietors of our own streams, and we "tune in" to the subset of people we find interesting, rather than topics we invest in. So, instead of bumping into the same person in a couple different groups, or never reading their words at all, you might find that your feeds overlap a bit more than they do with most people. This is how I find people to "follow" and blogs to read, in fact -- as my network expands, more people become loosely joined to it, and as I notice ones worth reading I add them.

Is one model better than the other? Probably not. I could make an analogy to music. In theory, I'd rather read what my favorite critics have to say about a wide variety of new releases -- some of which I'd never know about otherwise -- than keep up with the discussion of bands and genres I really like, even if most people writing about them are terrible (remember, 90% of everything is crap). But I also lose that sense of community of being a "fan" of something; I no longer have a deep connection to what's going on in the fandom or the scene, which I also would never know about otherwise (some of it is just too obscure for my favorite writers to cover).

In practice, it seems the new approach has been more popular, but maybe that's because more people are on the net now, and both kinds of communication are/were shaped by the technology available at the time (destinations make much more sense with limited/centralized computing resources, and aggregation makes much more sense with powerful clients and a wider, less specialized user base).

Anyway. This post is not actually about social media theory or whatever you want to call it; it's about some software I have packaged.

Because of all the above, I have always wished that I could use Twitter from something more like my IRC client. Like, say, my IRC client. One could abuse the concepts of IRC to make an "on the fly" channel of whoever happens to be in my feed. (I once read a blog comment somewhere complaining that Twitter could easily be implemented on any existing IRC server using one +m channel for everyone and some client-side direction of messages from all such channels to a single window. +1 for cleverness, but they did sort of miss the point of why normal people sign up for web sites rather than installing and configuring clients for obscure chat protocols.)

So, I had grand plans to write a Twitter client which was an IRC server, with some clever mapping of IRC concepts and commands to their equivalents over there. I never got around to it, as I barely have enough time for anything I do now. But last month I noticed that someone else had implemented such a thing: tircd. I had seen Twitter/IRC services before, but like the official Twiter XMPP service, they were all implemented as bots, which I detest for this sort of application. Bitlbee, for example, translates various IM protocols to IRC, but only halfway -- for anything else you have to use a bot as a sort of poor man's command line. If I want a command line for Twitter, I already have several; IRC can do better. And tircd really does! It's great. You're not required to edit the config file, and there's no extra layer on top of IRC for things like logging in or adding people.

I've finally packaged the latest release, which is waiting in NEW currently. I got a request for sneak preview packages, so if you want to install some unapproved .debs check this repository.

I may still pick my own project back up, as Twitter itself, being a centralized service, feels like a stopgap solution on the way to to a more generalized 140-character equivalent of the, er, blogosphere as envisioned by open-source projects like Identi.ca. In the future, perhaps, when we use a certain micro-blogging "service" we might be randomly connected to one of any number of servers run by different individuals but all mirroring messages back and forth to each other. Which, now that I think of it, sounds vaguely like some obscure, obsolete chat and news-posting protocols I know.

Another Monopoly Rant

A cautionary tale: Verizon's DSL salespeople are completely incompetent.

Several weeks ago, I ordered residential service. I was unable to get the prequalification app on their web site to work for my address, so I called in. I was told (1) I could get a modem for free, and (2) that I didn't need to do anything for the install except plug it in -- they'd just be turning something on at the CO, perhaps even before the date my service was scheduled to begin.

These were both, of course, pure fiction. On the install date, while at work, I got a call from a tech who had shown up at my house to rewire something. Luckily, I was able to rendezvous with him later in the day. The service worked briefly and then went down. Over the next few days, I called in, was told to wait at home for 8 hours, no one showed up, I called in again, was told "oh, sorry, it's going to be tomorrow", waited all day the next day, no one showed up again... a week and two actual visits later, I finally had connectivity.

I can't really blame them for whatever the line problem was, but to repeatedly treat my time as worthless by making me wait at home for an entire day just in case a tech can get there, even though it's highly unlikely they will, is just offensive.

Then, I received my first bill and... was charged for the modem. Calling in again, I was told that the deal I was offered was for online orders only, and even if I specifically confirmed with the sales person that he could get me the same deal since the web site wouldn't let me complete an order (and I already had an old, power-sucking modem anyway), there was no way I could get a refund. So I got an RMA and planned to go shopping for a replacement (people are selling the same model on Craigslist for $10). The RMA came in an email which also told me that if I did not return the equipment within X days I would be charged $100.

This is all utter bullshit. Do not use Verizon. If your only other option is Comcast, as it is here, just steal your neighbor's wireless or tether your 3G.

They could have left me reasonably happy with the whole botched situation if they had provided a honest estimate of how long it would really take to get someone out (even if it was "days"), or given me access to the same information without having to go through a call center of people only trained to read a script about how to power-cycle your router or whatever. And if they actually apologized for a saleperson convincing me to sign up by lying about what I would be charged. But why would they? What are you going to do, switch to an ISP that treats you like a human being and doesn't fuck with your traffic? Ha. Ha ha ha. Good luck with that.

You're talking about things I haven't done yet

I've been converting all my Mercurial repositories to Git. One of the motivations for this was hg's poor handling of branches and tags: branches are just another metadata field on a commit, and tags are entries in a text file called .hgtags that is tracked in the repository(!). Git has its flaws as well, but I prefer its view that branches and tags are just refs, which are pointers that exist at the repository level and are pushed or pulled around in much the same way as the objects they refer to.

The excellent hg-to-git.py script included in contrib created Git tags corresponding to my old Mercurial tags, but of course didn't modify the actual commits. So I still had a .hgtags file in my Git repository, and a boilerplate commit of the form "Add tag foo for changeset bar" each time I added to it (recall that .hgtags is just another file. Thankfully, I never had to deal with two branches where I added different tags...). I wanted to remove these. Git has a 'filter-branch' command that can totally rewrite or expunge commits; this is of course a horrible idea for already published code, but there's no harm in using while initially preparing a repository.

While I appreciate git's object model and speed, I must agree with its detractors that the user interface is terrible. It took some tinkering to get git-filter-branch to do what I wanted, so I'm writing this to save my notes for next time (and in case someone else is searching for how to do this). Here's the command I arrived at:

git filter-branch \
    --tag-name-filter cat \
    --index-filter 'git update-index --remove .hgtags'
    --commit-filter \
        'if [ $# = 3 ] && git diff-tree --quiet $1 $3; then
             skip_commit "$@"
         else
             git commit-tree "$@"
         fi' \
    HEAD

The tag name filter is always necessary if you want tags to be updated to point to the corresponding commits on the new, rewritten branch. I consider this a UI failure -- when a branch is rewritten, the ref is modified, and the old one moved to refs/original. Tags, on the other hand, stay where they are, without any indication on the new branch that this is where you might want to move that old tag and sign it again or whatever. IMHO they ought to be handled the same as branches.

The index filter is simply an efficient way of removing the unwanted file from all commits. This and the tag filter are both covered in the manual page.

Writing a commit filter is a little more obscure. After .hgtags is removed from the index, we may end up at one of those useless "Added tag foo" commits and have no changes to record in the commit. By default, of course, filter-branch still records these -- the commit message might be useful, or something. But I want to suppress them.

The commit filter is called with a tree -- you're at the point between write-tree and commit-tree (I recommend Git from the bottom up if you're confused here.) It gets that tree ($1), and then "-p PARENT" for each parent, just like commit-tree. So, if this is a normal commit with one parent, there will be 3 arguments. (If there's only one argument, there is no parent, i.e., the first commit, and if there are more, then it's a merge.) This is the only case we want to mess with. If there are no changes between our tree and the parent's tree, then it's one of those no-op commits, and we can skip it (skip_commit, a shell function defined by filter-tree, uses some deep magic to hand us the original parent again next time).

I think diffing the index and the parent would work as well, but this seemed clearer. It still feels like a hack, so I'd love to hear from anyone who can suggest improvements. Since this is a special case, maybe it's better off being implemented in hg-to-git.py itself. There's always more than one way to do it.

Update: Teemu Likonen points out that the next version of Git (1.6.2, not yet in unstable) will have a --prune-empty option which makes this particular problem totally trivial. I am starting to get the feeling that the Git developers are all reading our minds... :-)

Film at 11

Managers are dicks.

(ObBookMeme: "What's hard is sorting the result list so that the good results show up near the top.")

(Something I am working on now that Njiiri has search, but who gives a fuck, right?)

A bit late, but still

Happy Armistice Day.

Generally, "deal with huge amounts of" means "identify the 98% you can ignore"

Hello America, I missed you

As we walked back toward Mass Ave to catch the bus last night, fresh from the euphoria of my friend Josef's election-night party, I thought about how fortunate we are to be here, now.

We cheered, we screamed, we sang the national anthem at the top of our lungs, popped champagne, and danced. I'm on some stranger's facebook somewhere, at the edge of a crowd of people smiling for what felt like the first time in years. I knew I'd remember, but I wanted to write something down. I'd been following along on Twitter all night, but I didn't know what to say when the good news finally came.

I thought about our friends in California, where Prop 8 looked like it was going to win. Despite how far we've come, it's likely three states are about to write discrimination into their laws or constitutions rather than out of them.

So I picked up my phone and just typed in, "charged." Everyone was excited, yes. I haven't felt electrified by the possibility of change like that in a very long time. But we also haven't automatically won change by changing our leaders. We have won a responsibility to make that change happen.

"Yes we did", we joked to each other, and I emailed to my parents in the morning. But no one actually meant, "yes we can win an election". That's just the starting line. Now, we are charged with making America better. We are charged with protecting the liberties of everyone. I wanted to remember that too.

That's why it matters that we won. Eight years from now, this could be a nation where the whole idea of "banning gay marriage" sounds as antiquated and offensive as segregated schools or not letting women vote. We have the ability to change our culture, nothing more. Mandates don't do that; people do. And yes, we can.

Tap that class

And now, something entirely impractical.

I picked up Beginning Ruby by Peter Cooper the other day to look for some teaching material. Flipping through, I came across a basic feature that had heretofore escaped my notice: the Struct class. If you want to use a class for nothing more than bundling together a few values, you can create a Struct instead of writing out initialize and attr_accessor yourself. The idiom is like this:

class Server < Struct.new(:host, :port, :password)
  def to_s
    port == 6600 ? host : "#{host}:#{port}"
  end
end

That particular one's from Njiiri (which may clue you in that it took me several months to get around to writing this post). For educational purposes, it's nice: you can point out how we give a name to a class (I've also tried to explain assignment as "naming" rather than "storing"), that there's an unnamed class (classes are also objects!), and overriding a method (which uses the dynamically defined stuff), without any boilerplate to get in the way.

So it's tempting to use this in a lesson. Unfortunately, Ruby is a little bit weird here, and you run into those distracting "practical" issues about the particular language you're working in. Your classes can't subclass Class, and thus you can't say Foo.new like you can Struct.new. Struct is actually implemented in C (or Java, or whatever's native). So you have to do some handwaving.

This is because the "metaclasses" we have in Ruby are implemented, so to speak, as singleton classes of objects (including class objects). I do not mean this as a criticism of Matz at all, but they seem like more of a serendipitous thing than an intentional design -- "hey, if we implement classes in this particular way, we get this sort of metaclassing automatically [1]." (The clearest explanation of why this is that I've found is in the first chapter of Advanced Rails by Brad Ediger.)

I wanted to be able to just subclass Class, rather than have all that fun power that we normally get to abuse in Ruby, only because I think this is the clearest way to explain the abstraction. Even I still have trouble describing singletons in plain English. Struct is intuitively a class of classes, and factors out similar/boring stuff -- a good practice. It could be an example of how to refactor some simple classes, if only we could follow it.

I decided to give up on the idea for my tutorial, but it kept me up at night. Ruby metaclasses clearly can do anything that the sort of Struct-like metaclasses I have in my mind -- "parametric" classes, if you will -- can do, and I can dynamically define just about anything; why not make it happen? We can instantiate Class itself, but the tool we have to shape that class is singleton methods. This can certainly be abstracted away. So, I whipped something up.

Before getting to it, though, let's visit a new construct in Ruby 1.9 and 1.8.7: Object#tap. Apart from the obvious debugging use described in its documentation, it makes it quite easy to factor out this pattern:

def gimme_a_thing
  thing = Thing.new
  thing.do_stuff_to_it
  thing
end

Into something closer to the style of functional or declarative programming:

def gimme_a_thing
  Thing.new.tap |thing|
    thing.do_stuff_to_it
  end
end

(Well, "do stuff" is obviously still procedural, but A for effort.)

Which one is "better" could probably be the subject of much debate, but: I really prefer the second one; even though it has exactly the same effect, it looks like "what" rather than "how" (something else I try to beat into impressionable young heads. :-)), which is I think easier to write tests for. I'm going to use it here, because it's shorter.

Now, Object#tap passes the object as an argument to the block; in our case, we are going to define a metaclass, so we want to work with self, instead. So we define a new version, class_tap -- by analogy with class_def, I suppose --- which class_evals the block rather than simply evaluating it:

class Class
  def class_tap(&blk)
    tap {|_self| _self.class_eval &blk }
  end
end

And to do the following trickery, we make use of MetAid, written by why the lucky stiff -- it's very small, so we could always just incorporate the bits we want into the code here, but this short file provides a common vocabulary for talking about metaclass stuff which is quite valuable.

Now, we can write a method to create our new classes on the fly. Here's what I came up with:

require 'metaid'

class << Class
  def meta(_super=Object, &blk)
    new.class_tap do                     # 1
      meta_def :new do |*args|           # 2
        Class.new(_super).class_tap do   # 3
          class_exec *args, &blk         # 4
        end
      end
    end
  end
end

(Note the "class << Class", opening the singleton, rather than "class Class", opening Class itself. Also, the distinction between new and Class.new -- they are the same method, but from inside the meta_def we're no longer in the Class class.) The lines of meta itself mean:

  1. The value of this thing is a generated class, which we will describe thusly:
  2. Its singleton class has a new method, which gives you a value that is:
  3. Another generated class, which is defined by:
  4. the original block, which now gets run with new's arguments.

Nary an assignment in sight!

The (embarrassing) caveat is that methods in the block that defines the new class instance cannot use def to create methods as normal. A def is evaluated in a totally fresh scope (I don't think I get the opinion behind this decision, to be honest), so we need to use class_def instead. This is, I must admit, rather hideous. Perhaps someday the language will change.

But, first things first: now we can implement Struct in pure Ruby.

class MyStruct < Class.meta \
  do |*args|
    attr_accessor *args
    class_def :initialize do |*instance_args|
      args.zip(instance_args).each do |attr, val|
        instance_variable_set :"@#{attr}", val
      end
    end
  end
end

The real Struct class does a few other nice things for you, but this is the heart of it; I can go back to that Njiiri example and just swap in MyStruct for Struct (Providing an instant performance gain of -200%... :-)).

Here's a "shapes" example from my abandoned OO lesson:

class Polygon < Struct.new(:sides)
  def perimeter
    @sides.inject(&:+)
  end
end

class RegularPolygonClass < Class.meta(Polygon) \
  do |n_sides|
    class_def :initialize do |side_length|
      @sides = [side_length] * n_sides
    end
    class_def :area do
      @sides.size * @sides.first**2 / Math.tan(Math::PI / @sides.size) / 4
    end
  end
end

class Square < RegularPolygonClass.new(4); end
class Pentagon < RegularPolygonClass.new(5); end

(Note that area has no free variables and thus could actually be defined with def. It would just look funny.)

You only have to change one number here to make new polygon classes, rather than accumulating parameter lint or explicitly subclassing and redefining something implicit just for the derived class [2]. In a way it the exact analogue of the imperative vs. tap style described above. There is quite a bit of aesthetics involved; one way is not "right".

Apart from the syntactical wart, I like being able to do things this way. Ruby is, as they say, optimized for programmer happiness and the principle of least surprise. Still, I'm sure it is quite slow, and I don't particularly need it for any real-world application right now. For an intro to OO it's way too much complexity to have lurking unexplained beneath the surface and still requires getting bogged down in the language you happen to be using. But hey! It's neat.

[1]I basically try to do this whenever possible, anyway. Design decisions are like sex: they're better when they're free.
[2]Of course, if Ruby class variables were not leaked across the entire inheritance hierarchy, something as trivial as this one-variable example would be... trivial. At least, few people would actually miss metaprogramming features. Blub blub blub.

Generated by Mnemosyne 0.12.