034 RR Benchmarking and Profiling
- Published on:
- December 24, 2011
The Rogues discuss benchmarking and profiling.
KID: There’s something down in my pants!
KID: [inaudible screams]. I want underwear!
JOSH: What the hell?
KID: I want underwear!
DAVID: This has to go…
KID: [inaudible screams]
DAVID: This has to go in the header [of the] show.
DAVID: There’s something in my pants! That’s great.
CHUCK: Hey everybody and welcome to episode 34 of the Ruby Rogues Podcast. This week on our panel, we have Aaron Patterson back.
JOSH: Woohoo! Yay.
CHUCK: Yeah, he has smart stuff to tell us.
AARON: Wait. Is this the Ruby Rogues Podcast?
AARON: I thought this was the Ruby Rouge Podcast.
DAVID: No hang-ups. No hang-ups. You’re here.
CHUCK: Yeah well, we’ll find an Avon person and send you some.
JOSH: Hang on. I just need to apply my eyeliner.
CHUCK: Alright. We also have on our panel, Avdi Grimm.
AVDI: Hello from Pennsylvania.
CHUCK: You have a new book that’s in alpha or beta or gamma or delta or something?
AVDI: [Laughs] Yeah, I’m working on this thing called ‘Objects on Rails’ which is about applying classic object-oriented thought to Rails programs.
CHUCK: Cool. Well, where do you go to get that?
AVDI: My weblog, avdi.org/devblog.
CHUCK: Ooh, make it hard. Alright, we also have David Brady from Shiny Systems.
DAVID: I’m actually here this week. And I’ve got dumb things to say.
CHUCK: Has anyone heard from Dave? Alright, so are you hiring people Dave? Is that what I’m hearing?
DAVID: Yes, yes. It’s that time of year, I guess. And everything’s just going crazy. I’ve got more work than I can do. And so, we’re doing a lot of Rails rescue projects right now. And you come in, you fix somebody’s server, and you go. People that have very specific pain points and really crappy code. It’s awesome.
CHUCK: Sounds like fun. Are you looking for contractors or employees?
DAVID: I’m looking for contractors right now.
CHUCK: Okay. Also on our podcast, we have Josh Susser.
JOSH: Good morning everyone.
CHUCK: Yeah, and I guess his company was featured on TechCrunch which is cool. You want to tell us a little about that?
JOSH: Oh, I wouldn’t call it featured. The news isn’t that we were on TechCrunch. The news is that we got into the Rock Health Accelerator Program for our startup. And I will now reveal the name of our startup. It’s Cognitive Health Innovations. There’s really no information about it online yet. However, there will be a launch Rock page where you can sign up and get information at some point soon.
AVDI: I just want to thank you for not naming your startup something like Blooply or something like that.
JOSH: Well, we were going to, but that was taken.
CHUCK: I was going to say, “I’m going to go register Blooply.com.”
DAVID: You could have done something like Nuttr or Crazr with no E. N-U-T-T-R.
JOSH: Okay, I am going to fly to Utah and slap you now.
DAVID: And I’m going to sit here and deserve it.
CHUCK: And I’m Charles Max Wood from Teach Me To Code. Yeah, I’m working on a few things here and I’m also looking for contractors. So yeah, you can email Dave. Dave, is it firstname.lastname@example.org?
DAVID: It’s dbrady@shiny…
DAVID: Yup. Look for me first, because I’m better.
CHUCK: [Laughs] Alright, well let’s get into the topic today. We’re talking about benchmarking and profiling. And I don’t know, to start us off, I want to talk about maybe benchmarking first, since it to me seems a little bit simpler. There’s a benchmarking piece of the core libraries for Ruby. Is anyone an expert on that? Because I’ve only used it for, “How long does this take? And how long does that take?” And then the numbers are littler on one than the other so I like it better.
JOSH: Hey, it’s been a while since we’ve done this. Let’s define benchmarking.
DAVID: I was about to call for a definition, yes.
JOSH: [Laughs] Okay, nice. Well, since I called for it, maybe you can supply it.
DAVID: Okay. So…
JOSH: Swallow first.
DAVID: I’m eating breakfast. It’s really sad. We have Christmas candy in the house and as we were starting the call I was like, “Wait, wait, wait. I have to go to the bathroom.” And I walked passed and I grabbed a handful of chocolate-covered raisins. And I’m eating these as I’m sitting in the bathroom. And I realize, “This is what a no-op is.”
DAVID: So, benchmarking is where you take…
DAVID: …some operation and you test to see how long it takes. You basically grab something in isolation. And often, usually what you do in benchmarking is you do something over and over and over again. You repeat it 10,000 times, because often the thing that you’re measuring only takes a few microseconds of a few milliseconds. It’s down inside the Planck constant of what it takes to run, of what Ruby can measure for timing. So, you do it 10,000 times and see. It’s like big O notation. If we make O large enough, or if we make n large enough, then eventually we’ll start to see this thing, the performance, begin to degrade. And so, we can compare two algorithms this way.
Versus profiling, which is where you take the entire system, which is this big, complicated hairy mess. And you do the same kind of thing except that you basically are running it under some kind of thing that times everything, every function entry, every function exit. And profiling then lets you see, “Where am I spending the most of my time? What are my operations that have the most latency (which is a little bit different than spending the most time in something)?”
And profiling and benchmarking are very, very similar. Benchmarking is where you try to find out which algorithm is the fastest by swapping them in and out. And profiling is where you try to take a very complex, compound thing and say, “Where am I spending my time here? Which piece of this is the most time?” And if you’ve got a really big, hairy, long initialization loop that only runs once at the beginning of your program and you’ve got another loop that runs every single page load, you really want to focus on the page load, because that’s the thing that costing you the most down the line.
DAVID: And now we know. Don’t ask Dave for definitions.
AARON: I have never thought about it this hard. Wow.
CHUCK: That was directly out of Encyclopedia Brady-tannica.
DAVID: The Gospel according to Dave.
JOSH: Right. And my take away from that is that it’s not for amateurs. Or at least building the tools is not for amateurs. So, when Dave was talking about getting in the Planck constant, [chuckle] getting down to the fine resolution of things, I think there’s a lot of timing things you can do that give you a very course grain measurement. Oh, my tests took 32 seconds to run my suite. That’s pretty easy to measure just at the terminal. But when you start getting into code, you need some decent tools to help you make sense of what’s going on because there is so much complexity involved in measuring what’s going on accurately.
AARON: I have to get something off my chest.
AARON: So, one thing that’s really important to me, I think that maintainable code is much more important than fast code. And I worry, because I’m afraid that we have all these benchmarking tools. And us as nerds, we love to measure things. And I really think that measuring the speed of some particular piece of code is much easier to do than measuring its maintainability. So, it really worries me that people focus a lot on speed and not on maintainability. So, I had to get that off my chest.
DAVID: That is brilliant. And I feel very strongly the same way. And what we ought to do is come up with a hybrid benchmark, which is where I hand you a piece of code and I say, “Tell me how long it takes you to understand this. Go.” [Makes timer sounds]
AVDI: WTFs per minute.
AARON: Well, honestly, the thing that I found is that as you get into, the more maintainable your code is, if you write nice, clean maintainable code, it’s probably also fast.
DAVID: Yes, yes.
CHUCK: Yeah. I think that’s almost always true. I want to just pick one little bone with you, and that is that sometimes the code is too slow.
CHUCK: But that’s really the only benchmark that matters as far as time. If it’s fast enough, then optimizing it at the expense of maintainability doesn’t make a lot of sense.
AARON: Right, yes.
AVDI: Well, and even if it’s slow, if you have nicely modular code, it’s often a lot easier to take that and make it fast than it is to take some code that somebody wrote only to be fast, which may or may not have succeeded, and then make it modular.
DAVID: I have a saying that I love to bandy about, which is that, “It is far easier to optimize correct code than it is to correct optimized code.”
JOSH: [Laughs] Yeah, never mind. [Laughs]
AVDI: Very, very true.
CHUCK: But yeah, at the same time, sometimes it’s just a matter of interest. I have two things that are syntactically correct or pretty easy to read. I want to know which one’s faster so you slap a benchmark.bm do around it, you run it, and you figure out which one’s faster.
AARON: Yeah. Sorry I derailed everything. I just had to like, “Errr.”
DAVID: I don’t think that was a derail.
AVDI: Not at all.
CHUCK: No, I think it’s important.
DAVID: I don’t think it’s a derail, Aaron. There are two things that I wanted to get off my chest today, and that was one of them. And the other one is, especially in a thing about benchmarking and profiling, is you got to measure.
DAVID: If you go in and you replace – you’re using the hash inject, using inject to build up a hash, which was slow in 1.8.7 but it’s fast again in 1.9.2 – if you go in and change that from a single-line inject to a five-line loop that’s building up and then creating this hash and you don’t measure it, if it seems faster but you don’t have a measurement, (I’m stealing this quote directly from Steve O’Connell. I can’t take credit for this), but if you don’t measure it, the only thing you know for sure is that you made the code harder to read.
AVDI: Yeah, that is so very true.
AARON: Yeah. Well, if you don’t measure anything, you can’t tell that you’ve improved anything without measuring it, which is why I think a lot of programmers focus on speed, is because that’s pretty easy to measure.
DAVID: Yeah. Well, the thing that kills me is that so many programmers focus on speed but they won’t measure speed. They will abandon an entire architecture, “Oh, we can’t do this with service-oriented stuff because it’ll be too slow.”
AARON: Oh, yeah.
DAVID: And I always come back and I say, “Wait. Wait. Wait. Wait. Wait. How slow is too slow? And is this that slow? Yes or no?”
DAVID: And they never have an answer to, “How slow is too slow?” which means it’s fast enough.
CHUCK: So, this leads into another question that I wanted to ask, and that is when do you want to benchmark or measure? Because some things aren’t worth measuring and in other cases they are. I’m hearing if you have a specification for speed, then that may be something you want to measure. Are there other instances or things that you want to make sure that you’re measuring in your code?
AARON: I don’t know. Typically, whenever I do benchmarks on stuff, it’s typically because it’s just… I can’t say that I do benchmarks preemptively. Most of the time, it’s in reaction to something that’s just too slow. I’m just doing something and it’s too slow. We’re serving up a request and the request is just too slow. So yeah, I don’t know.
DAVID: Okay, but you’ve touched on the key thing. You’re doing discovery-based profiling. You may not have a hard metric for what too slow is, but you know it’s too slow.
AARON: Yes. Yeah.
DAVID: You type rails.c and you have to wait 17 seconds. That’s too slow.
DAVID: And when I run rspec spec, if it takes more than seven seconds, I call that my Twitter limit, which is the amount of time it takes me to get distracted from what I’m doing and go check Twitter. And if my spec suite does not run in under seven seconds, it’s too slow. And okay, well that’s a hard benchmark. We need to get this thing to run in under seven seconds.
AARON: You know what makes me laugh? I think as I’m progressing forward in my career, my attention span is getting shorter and shorter, because don’t tell anybody this, I know this is on a podcast, but don’t tell anybody this.
AARON: I used to be a Java programmer. Oh my god.
DAVID: Nobody listens to this podcast. It’s just us.
AARON: Good, good, good, good.
CHUCK: I was going to go [gasps].
AARON: And builds would take, builds, just building the code would take five to ten minutes. And then starting the stupid app, starting the stupid server would take five to ten minutes. And now, we’re complaining about seven seconds? [Laughs]
DAVID: I love it. I love it.
AARON: I think it’s an awesome problem. I’m like, “Yes! Seven seconds.”
DAVID: I remember those days. Yes, I was taken prisoner and forced to work as a slave in those mines at one point, myself. And yeah, we had a CI server and we had a rigid testing benchmark that your unit tests for the entire Java suite, for the entire thing end-to-end, your tests must run in under 30 minutes.
CHUCK: There you go.
DAVID: Yeah, yeah.
JOSH: You know, we’re laughing but there are plenty of people who still have to deal with that kind of stuff. [Chuckles]
CHUCK: Yeah, well I find it interesting that we’re talking about the speed in our code. And a lot of times we’re thinking specifically on user end response time and things. And here we’re talking about startup times and testing times and things like that, which are also important when we’re talking ab out speed. So, anything, you’re a consumer of your own code in that way. And any consumption, if it takes too long, should be measured and analyzed.
JOSH: So, let’s talk about benchmarking in particular, for a moment, since that’s half the topic. So, there’s the standard benchmark library built into Ruby’s standard library. And is there anyone here who hasn’t used that? So yeah, it’s incredibly easy to use. You just require the benchmark library. What is it called, BM?
JOSH: Great name.
AARON: Actually, I think it’s just Benchmark.
AVDI: It’s Benchmark.
JOSH: David, could you say something about that?
AARON: Yeah, it’s just Benchmark.
DAVID: These chocolate-covered raisins are delicious.
JOSH: Yeah, so it’s the Benchmark library. It just gives you some simple timing tools. You give it a block and it will report the performance of that block, how long it took, by running it repeatedly. You can tell it how many iterations to do. And I made that comment earlier about benchmarking isn’t for amateurs. I used to say memory management isn’t for amateurs. That’s why we invented garbage collectors, right? So, benchmarking is much the same. There’s a lot of stuff that seems obvious about it, but there’s a lot of subtlety. And it’s really good to use a package that can help you with your benchmarking because they’ve probably done some things to avoid some of the stupid obvious mistakes that you can make, like you don’t want to test the time that it takes for the control structure of the looping, that kind of thing.
AVDI: Right, and they make it easy, or that library makes it easy to run it a zillion times and then take the average.
AARON: Have you guys seen Evan Phoenix’s benchmark suite gem?
JOSH: No, no. Tell us about it.
AARON: Okay. So, this one, it’s similar to the benchmark thing from standard library, except that one thing I really like about it is the syntax is almost the same. You just give it a block and it runs some code. But what’s cool is you just give it the block and it runs the code and it gives you back a measurement of iterations per second versus how long did this take total?
AARON: So, you could take two different algorithms and then just plug it in there and say, well, how many times could I execute this block in one second? Or actually, I think the default time is five seconds. So, it runs a block for five seconds and then you’re able to see which one could actually run and what is their deviation, stuff like that. It’s a very useful tool. So, look for that. Benchmark suite.
CHUCK: I think that’s interesting, because it’s basically the inverse of what you usually see which is seconds per iteration.
JOSH: That’s pretty cool.
AARON: Well, it’s nice because when you run with benchmark from standard library, you always have to figure out, “Well, do I run this 5,000 times?”
AARON: Or run this 10,000 times? And you don’t have to figure that out anymore.
JOSH: Yeah, so it normalizes the performance measurement.
DAVID: Yeah. Remember though that everything the system does for you, the system also does to you. The ability to plug in your benchmark and say I want to run this 10,000 times versus I want to run this 10 times is that a really long-running web scrape thing, 10 is actually going to be enough time. It’s going to take you 90 seconds to run this thing 10 times. And so, that’s enough to settle out the thing. But yeah, if you’re injecting up a hash with three keys, you’re going to need to run this 10,000 or 100,000 times before it will settle.
CHUCK: Yeah, or in our case, injecting up the Rogues with caffeine.
JOSH: So, there’s also the worry that benchmarks lie.
JOSH: And you got to be really careful. So, people can deceptively craft their benchmarks to make their code seem faster or more performant.
JOSH: Or more worth you paying them lots of money. And people can inadvertently mess up their benchmarking without intending to. And one of my early experiences with benchmarking was trying to get a handle on the performance of the Smalltalk virtual machine. And we were comparing different virtual machines. And I remember that somebody had come out with a new virtual machine. And it was one of the first pieces of software around that used what we now call just-in-time compilation where it would take a Smalltalk method that was in bytecodes and compile it to a bunch of machine code and then cache that around. And so, that’s a very common technique now. But that was developed by Peter Deutsch for Smalltalk virtual machine. And now, it’s used everywhere. So, we’re running the standard Smalltalk benchmark suite for the VM performance around that and it was crazy how much slower it was. And that’s because the Smalltalk suite ran a whole bunch of different methods in series.
DAVID: One time.
JOSH: Yeah. It would run each method one time. And then at the end of that thing, it would go and do it all again. So, if you took that benchmark suite and you turned it on its ear so that it took each method and ran it 10,000 times, and then did that in series, you got a completely different measurement of the performance of this virtual machine.
DAVID: Does anybody…
DAVID: Go ahead.
AARON: No, no. Go ahead.
DAVID: I was just going to say, is anybody besides Josh here, I can pick on Josh because he and I are old farts, but is anybody else here old enough to remember wet stones and dry stones?
JOSH: [Laughs] Oh god, yeah.
DAVID: Yeah, okay. So, these were benchmarks that were for CPUs, for 286 CPUs, 386, the 486. And what they found is the 486 came out and it just blew the doors off all the benchmarks. And the reason it was, was because the engineers at AMD and Intel and Cyrix, back when they were still a company, they all sat down and looked at these benchmarks and said, “How can we kick ass on these benchmarks?” And they redesigned the CPU to fix the benchmarks. All of a sudden, L2 cache appeared, came into existence, specifically so that we could branch cod e and hold the entire benchmark in cache on the CPU, because it turns out that we spent all of our time fetching crap out, fetching code out of memory. And yeah, so benchmarks are BS.
DAVID: The other really obvious benchmarking mistake you will make is by printing your benchmark results to the terminal, which you basically say, on each run you say, this took 0.03 microseconds, this took 0.06 microseconds. I got news for you. If you are printing something to the terminal, you’re benchmarker and the code under test is going to spend all of its time printing to the terminal.
JOSH: You’ve now measured how long it takes to print to the terminal.
CHUCK: How long does it take you to do your job and tell me about it?
DAVID: Yeah. It takes longer. It’s easier done than said. [Chuckles]
AVDI: Well, the other thing to keep in mind with benchmarks is that it’s still pretty easy to mislead yourself in a couple of ways. If it’s a benchmark over a really broad, a whole lot of code, if you run that, you see that it’s slow, you may still come to a very wrong conclusion about why it’s slow. As a matter of fact, one of my general rules for optimizing is whatever the reason you think that the code is slow, you’re almost certainly wrong. I’ve done this so many times, come up with a hypothesis about why the code is slow after benchmarking it. And then rewrote it to address that issue and then discovered that, oh it wasn’t the algorithm after all. It was actually just a really slow database connection or something stupid like that.
DAVID: Okay, but you profiled, you tried something new, and then you profiled again, right?
AVDI: Well, so I’m making the distinction there between the benchmarking and the profiling.
AVDI: And another way, a sort of related way you can get into trouble is, okay, you see that the overall request is slow. And then you benchmark and then you make a hypothesis about, oh well, I’ve been suspicious that this algorithm that I’m using in this one place is slow anyway. So, I’m going to just go ahead, jump in, and benchmark that algorithm. And you find a way to double the speed of that algorithm. But then, the overall request time doesn’t go down at all, because it turns out that actually that algorithm was down in the noise.
AVDI: Performance-wise. And you should have been looking for something else.
AVDI: And so, that’s where you really have to force yourself to profile and look at what is actually causing the issues.
AVDI: And you have to learn how to read the profiler output because it’s not always obvious. And you might be thinking, oh, all of my program time is being spent in each. Each must be really, really slow.
CHUCK: Damn that each.
AVDI: If you’ve ever run, if you’ve ever looked at output from Ruby Prof, you may have at one point thought, wow. Each must be the slowest thing in the entire [inaudible].
AVDI: It’s because what you’re actually looking at, if you look at the numbers carefully, each itself isn’t taking that much time. It’s just that pretty much everything that your program does is inside a loop somewhere.
DAVID: There’s another block of time that’s really big and it’s almost as big as each and it comes right after each in the profile list, yeah.
CHUCK: Yeah. So, we’ve crossed the line into profiling. What tools do you guys use for profiling?
AARON: Ruby Prof. I’ll tell you what I use. Ruby Prof, I use Ruby Prof. I don’t use the profiler that’s built into Ruby because it’s way too slow. I use the…
DAVID: I have a real stupid question for you, Aaron.
DAVID: Are you talking about tools for 1.8 or 1.9?
AARON: Uhh, both?
DAVID: Because the last time I checked, now granted I’m a luddite and that was six months ago, the last time I checked, you couldn’t profile in Ruby 1.9.
AARON: Yeah, you can.
DAVID: Or maybe it’s code coverage you can’t get. You can’t get line number code coverage anymore.
AARON: Oh, you can. Actually, 1.9 has code coverage built in.
DAVID: Okay, this podcast just paid for itself.
CHUCK: [Laughs] You’re getting your money’s worth, Dave.
AARON: If you look, okay short side note. There’s a standard library thing called coverage. If you require it, then you can get code coverage statistics from your code. The important thing you have to remember is that the only code coverage statistics you’ll get is for code that is required after you require the coverage tool.
DAVID: I think SimpleCov does that, actually. I think SimpleCov uses that.
AARON: Yes. SimpleCov uses that. And the main one, there’s actually a strange kind of annoying bug. Actually, it’s not a bug. It’s just how it works, is that if you… Rails unloads files and then reloads them. If you reload a particular file, then the coverage for that file, the numbers get reset.
AARON: So, you have to make sure that your code is not reloaded when you’re doing the coverage. And if you’re doing coverage with Rails, it can mistakenly happen.
AARON: So, just watch out for that. But anyway… Sorry.
DAVID: No, go ahead.
AARON: Profiler, yes I use the Ruby Prof, Google Perf Tools, perftools.rb, Aman’s gem.
AARON: It’s really awesome. I like that. And then recently, I’ve been using DTrace, but it’s a super fork. This is not for normal human beings. [Chuckles]
CHUCK: Are you not a normal human? Never mind.
DAVID: DTrace was the whole reason we invited you on. You can’t weasel out on this.
CHUCK: Right. So, we’re talking about profiling the Ruby virtual machine. I think it’s also interesting to point out that there are other systems that profile things like Rails apps and stuff, like New Relic, or Scout.
AARON: Oh yeah, New Relic. We use that a lot.
CHUCK: So, I just wanted to point that out. But I’m really curious about how you go about profiling the Ruby virtual machine. So, Aaron, you want to walk us through a little bit of the process and thought process on that?
AARON: So, well, actually the DTrace stuff that I added is for regular profiling. It’s for profiling your own methods and profiling whatever, any type of user land code that you want to do. And basically all it does is it hooks into Ruby’s virtual machine and intercepts calls. Whenever a call is sent on an object, it just intercepts that call an notifies the DTrace system, sends a message to the DTrace system saying, “Hey, some method was called.” And then you can count the number of times that a method was called.
But there are two kinds of profilers you can write with. You can count the number of times a method was called, but you might not care about that because maybe you’re calling some method a million times but that method takes zero time to execute basically. So, you don’t care. Really, what you want to care about is how long you’re spending inside of a method. So, I use sampling profiling for that. So, the idea behind sampling profiling is that you say, “Well okay, I have the same probability.” Let’s say that the probability to be inside of any particular method is even. So, if you sample and say it turns out that, well, every time you sample, you’re inside this one method. Well, that must mean you’re spending a lot of time in that method. So, it’s a weird way to think about it, but…
DAVID: Wait. So, your profiler just asks, just comes in randomly and says, “Whatcha doing?”
JOSH: Oh yeah, that’s a common technique. That was the first kind of profiler that I used, was a sampling profiler.
DAVID: You have just validated my entire childhood.
DAVID: I was sample profiling all the adults around me. “Whatcha doing? Whatcha doing?”
AARON: Whatcha doing? Well, the problem is when you sample, it has to be at regular intervals. So, it can’t be random.
DAVID: Yeah, I didn’t do that. [Laughs] I didn’t do that. [Chuckles]
AARON: But yeah, I don’t know. The perftools.rb, that one does, that’s a sampling profiler as well. The weird thing is you’ll get numbers out of that which are basically percentages. Since you’re sampling, you can’t tell how long you’ve been in a particular method. You just know that some percentage of time, you were in that method.
CHUCK: Right, how frequently when you asked, it was doing whatever?
DAVID: You have no ability to determine Heisenberg’s velocity using this profiler.
JOSH: You also don’t know what the call tree looks like.
JOSH: So, if you were in Ruby Prof and you look at the call tree that comes out, oftentimes you can see, oh, there’s one method and sometimes it takes a lot of time if it’s called from one caller. And other times it doesn’t if it’s called from a different caller. And that’s useful information.
AARON: Oh, yeah.
JOSH: You’re not going to get that when you’re doing the sampling profiler.
CHUCK: So, Ruby Prof is the Patriot Act on your application? It’s watching everything. It’s watching [inaudible]
DAVID: You know, I was waiting for everybody else to laugh, but I was personally kind of horrified by that thought.
DAVID: And then I realized, so is everyone else.
DAVID: It’s like the TSA. It’s like, “No, we’re not going with that joke.”
JOSH: Anyway, sampling is really great to know where to apply attention. But you got to be careful because if you have code, you can actually have code that’s running in your system that is taking a huge percentage of your time, but because if the way sampling profiling works, it can miss it almost entirely. It can be something that runs, oh, this is taking up 10% of my program’s execution, but it’s doing it in small enough chunks. And the way the instrumentation for the sampling profiling works is such that it always misses that tiny little bit.
AARON: Well, you have to, I think when you’re doing sampling, at least I’ve found when I’m doing sampling profiling, it’s usually the places that I’ll find that are slow are what I’d call basically leaf nodes in your program.
JOSH: Yeah, inner loop.
AARON: Yeah, exactly. And you can’t… So, typically what I’ll do is I’ll say, well, oh that inner loop is slow. But I don’t want to look at that. I probably want to look up the call stack from that. It just gives me, it points me in the right direction, basically.
CHUCK: So, it’s a quick smell test.
JOSH: Yeah, but as with benchmarking code, profiling code lies too, your profilers.
AARON: Yeah, oh yeah.
CHUCK: Yeah. Well, when I was in college, I took a signals class and it was the same thing. If you’re sampling frequency corresponds with something else. I think the classic example that they gave us was if when you’re watching TV and it looks like the car’s tire is spinning backward, even though the car is moving forward, that’s a sampling problem where it’s actually taking the series of pictures at just the right moment so that the rotation of the tire appears to have turned backwards slightly instead of forward slightly, because it’s spinning just at the right speed to do that.
JOSH: Yeah. I love the example where you have a strobe light that’s perfectly in sync with the speed of water droplets falling from a tap. And it looks like they’re hovering suspended in mid-air.
CHUCK: Right, yeah, same kind of thing. So, what you’re seeing and what’s actually happening are two different things. But you can usually work your way around that either by changing the sample rate or looking at what’s going on with maybe some other tools and then getting a feel for, okay, this is really actually what’s happening. I can measure that the volume of water is slightly greater, or this or that, and get an idea of some other measures that you can get to get an idea what’s going on.
AARON: Oh, I got to get one other thing off my chest here too. When people give you benchmarks, you probably shouldn’t believe them.
JOSH: Even if they come from you?
AARON: Even if they come from me. Seriously, you should test it. You should absolutely test it. I really get annoyed when I hear people say, well, I heard that X, Y, Z was faster, and actually, there’s one rumor, I’ve actually heard this several times and I think that it’s my fault. I actually think this one was my fault. I was trolling people, but I don’t think people realized I was trolling them.
CHUCK: Oh no.
AARON: And it turned into a fact, a quote fact. And basically, what that was is I think I’m one of the five Rubyists that write their Ruby method definitions without parentheses.
JOSH: [Chuckles] Oh yeah, Seattle style.
AARON: Yes. And nobody, people kept hassling me. Why do you do this? Why do you do this? And the truth is I just like it.
DAVID: And you told somebody it was faster, didn’t you?
AARON: Yes! I told someone it was faster.
AARON: So, the way that I proved it, “proved”, I know you can’t see my air quotes here on the podcast, but “proved” it was faster is if you run ruby-y against some code, Ruby actually outputs the parser states that it goes through. So, it actually outputs the states of the parser. And if you write a method definition without parentheses, it actually goes through fewer parser states.
AARON: Than if you use with parentheses.
AARON: So, this was my “proof”. But then, months later, I completely forgot about it and I just heard through people. Somebody was like, what is this thing about methods without parentheses are so much faster? I’m like, what?
CHUCK: Oh, man.
JOSH: Okay, folks. You heard it here first. Never believe anything Aaron tells you.
AARON: Yes! Don’t.
AVDI: That is awesome.
CHUCK: Well, I think there’s another good point here. And that is that a good benchmark isn’t just going to give you a list of numbers. It’s going to say, look, this is the process that I went through. This is the code I wrote. This is the code I tested. This is the process I went through. And that way, you can actually duplicate it on your machine, with your VM, and make it happen in your thing. And then you can tweak it around and you could say, you know, this is mostly right. But under these circumstances, it doesn’t hold up, which tends to tell me that this is the case rather than the conclusion that was come to. And you can make an educated decision about what you’re going to use and what you’re going to believe, as opposed to just somebody coming out and saying, well, if I do an each do with a parameter versus using the curly braces, then that’s faster because X, Y and Z. You can try it out and actually know that.
AARON: Yeah. I guess we were talking about this earlier. I just don’t like it when I hear developers say, well, I don’t want to do something that way because it’s slow.
AARON: And it’s something like this.
DAVID: The thing to remember is that slow is a human perception. And it’s valid as a motivation, but it is not valid as criteria until you actually quantify it. If you won’t tell me how slow is too slow, then shut up and write it clean.
DAVID: And we’ll figure out fast later.
CHUCK: I find that interesting too, because again, if you’re sitting at the doctor’s office, “Oh, I’ve been here of 20 minutes,” and you look at your watch and it’s been three, versus you get into coding and you’re like, “Well, I’ve only been coding for about five minutes,” and you look up and it’s been a half hour. We’re really bad at estimating that a lot of the time.
AVDI: So, we were talking about communicating the process. And I was wondering Aaron, if you would just walk us through a recent profiling story.
AARON: Oh, yeah, I guess I’ll talk about the one I blogged about recently. I’ve been trying to deal with, so Rails startup time is too slow, right?
CHUCK: Always has been.
AARON: And for me, too slow, I don’t know. So, I do open source stuff. And I do it for work. And we have a lot of developers, so I’m trying to increase their productivity. And startup time impacts productivity. The things is, with Rails you can’t just say, “Well, okay. Well, we only start up once, so it doesn’t matter,” because that’s not actually true. You actually start up when you run your tests.
AARON: So, we’re actually doing the startup a lot. So, it makes a huge difference. And for me, just developing against Rails is just way to slow. So, I decided to profile it. And it turns out that I would go through, I hacked the Ruby VM to add DTrace because I wanted to do that. And then I used that for my profiling and found that, “Well, okay. We’re spending a lot of time calling to_s on symbols.” And I’m like, “Okay.” Well, we’re calling to_s, basically running rake environment would call to_s on a symbol over 200,000 times, something.
AARON: Yeah. And what’s really awesome about that is every time you call to_s on a symbol, it creates a new object.
AARON: And what it would do is it would say, as soon as you required Envjs, it would shell out to try and find each of those processes. And it wouldn’t stop. If it found one, it wouldn’t stop. It didn’t care. It just went through all of them. [Laughs] So, I fixed it to stop, but still shelling out is super slow. So, I ended up changing that to use, just search the path, search path. But yeah, getting rid of that shaved a few seconds off of startup time. Actually the main speed improvements that we’ve seen now are getting rid of that to_s, which all came from Rails’ constant reloading. So, yay.
DAVID: So, I just learned something. And I hope other people listening to the podcast also just learned this, which is that if you put Ruby code in your gem spec that does any kind of shelling out to dig around in the file system, that’s going to get called every time you start Rails. So, stop doing that.
AARON: So, the problem is, well, the thing that sucks is, so basically, I don’t know. Actually, I don’t want to go there.
AARON: It gets me into a long rant.
JOSH: Rant, rant, rant, rant.
DAVID: Do not anger the Aaron.
AARON: No, okay, okay. Reader’s Digest summary is if you’re developing against [sighs] we have to have a manifest of the files that we’re going to require in a particular gem and that’s what your gem spec contains. And most people don’t want to generate that in advance. They like to use Git to do that, which means they have to shell out to ask Git to do that. But that means that evaluating the gem spec, every time you evaluate the gem spec, you have to do work, versus if you had just generated that gem spec in advance, you don’t have to do any work.
AARON: And I actually like, I actually prefer that style, because I make a lot of mistakes and I end up with vimrc, vim crap in my gems. If you go through some of the first gems that I ever published, I guarantee you’ll find swap files.
AARON: It’s pretty embarrassing. But I don’t know. I understand that people want to be lazy, because I’m very lazy. But…
DAVID: You know, I hate that though, when I open up a gem spec and it says files to include and there’s this line noise instead of the list of the files that are included. I’m not a Ruby interpreter and I can’t read this thing. And so, I actually have started putting a gemspec.erb in my gems. And when I get ready to publish the gem, I actually just stomp it down to a gem spec. And it’s fully expanded. And it has the complete file list in the thing. So, I think I might accidentally be on your friends list.
AARON: [Laughs] Yes, you are.
AARON: If you’re generating your gem spec in advance, it’s fine.
CHUCK: Yeah. Dave, do you have an example of that that we can link to in the show notes?
DAVID: I think migratrix does it. If not, I will dig something up for the show notes, yes.
CHUCK: Okay, just shoot me a link.
AVDI: Well, I think the obvious take away from that whole story is that we should all mindlessly and unthinkingly avoid using to_s and shelling out in all of our [inaudible].
AVDI: Because it’s slow.
AARON: Yeah, why are you calling to_s, why are you calling to_s so much? Come on.
CHUCK: Yeah, I can just see the Ruby zombies. To_s.
AARON: Actually, it was interesting. Where that code came from was, so in Ruby 1.8 when you say some_constant.constants, you ask it for its list of constants, it’s actually a list of strings.
DAVID: A list of strings, yeah.
AARON: Versus when you do that in 1.9, it’s a list of symbols. So, if you want to write this code to work on 1.8 and 1.9…
CHUCK: Oh. That makes sense.
AARON: What are you going to do? So, that was part of it. But what we actually ended up doing is putting off, what’s funny is this constant reloading code was being activated when you run rake environment. And when you run rake environment, actually no user code has been loaded. Only Rails libraries are being loaded. And yet, all the Rails libraries were being registered with all of the constant reloading hooks and stuff. But you’ll never, ever reload Rails code itself.
AARON: So, what we ended up doing is pushing that down, pushing that off to not activate the reloading code until user land code hit. So, then we don’t have to spend so much side inside of the reloader with just normal library code.
CHUCK: That’s interesting.
DAVID: Very cool.
CHUCK: Alright, well I hate to cut this off, because I think there’s more that we could talk about, but we want to try and keep this to an hour. So, we’re going to go ahead and get into the picks. Let’s start with Josh this time.
JOSH: Okay. Let’s start with me. So, I have a cool little tool that I’ve been using. I saw it tweeted about a week or two ago. I’ll give Blake Misrani the credit for the first tweet I saw about it. And it’s called Divvy.
JOSH: So, I’m sure some of you have heard of it already. It’s a tool for letting you resize windows really quickly. And if you’re doing a lot of programming and you got a bunch of windows on the screen, oftentimes you want to have a couple of your windows zoomed up and each take half your screen. Or oh, this should take the whole screen, or oh, I want this small in the corner, or focused in the middle. There are a couple of tools out there for doing that that I’ve used before like SizeUp is one. Divvy is just awesome. It’s the nicest combination of using graphically describing things using the mouse and using hotkeys. And has an incredibly flexible system for setting up hotkeys and for setting up proportions of the screen that you want to use. I love it. It’s definitely worth the couple of bucks. It’s in the Mac App store. So, I can recommend that. That’s really awesome.
And then I’ll just go for the Netflix streaming pick again. I watched this series called ‘How the Universe Works’. Originally, my educational career started out as astronomy. So, I [inaudible] Astronomy at Caltech, yeah. That was predictable. [Laughs] And I really quickly figured out that astronomy was basically all math. And I didn’t want to spend my life doing math, so I went into computers where you don’t actually have to know any math.
JOSH: But I still love astronomy. And ‘How the Universe Works’ is a very recently done series. It was done in the last year. And so, a lot of the information in it is pretty current. A lot of the stuff about the number of exoplanets is not really up to date. But mostly, it’s really good current information. They have engaging speaking. The animations are pretty cool. And there’s a half dozen episodes or so and they talk about everything from black holes to cosmology and planetary formation. So yeah, it’s a pretty good series. And I find it’s pretty accessible and yet it goes into enough detail to have it be interesting. So, educate yourself. Check it out. See where we all come from.
JOSH: That’s it for me.
CHUCK: Dave, what are your picks?
DAVID: My picks are three books that are tangentially related to what we talked about today. And that is ‘Freakonomics’, ‘Predictably Irrational’, which I think might have been a pick in the past, and ‘Freakonomics 2’. And all three of these are my picks today because they get into hard numbers, hard experiments about the way people do things and the way they behave, and the surprising rationales and the surprising reasons for why things that seem erratic, seem to be noisy, seem to be chaotic, that if you look at them just right and understand what’s going on underneath the skull plate there of a human being, that it suddenly becomes a very rational thing. It suddenly becomes a very almost deterministic event. And ‘Freakonomics’, if you just haven’t read it, you’re just not living right.
DAVID: Just the surprising discoveries that he made. And he’s an economist. And you wouldn’t think a book on economy or economics could possibly be interesting. If you took economics in my high school, you know that it’s got to be the highest suicide rate job right after dentists of anything. And then you go read ‘Freakonomics’ and you come away thinking, “Man, I want to be an economist, because these people figure out what’s going on.” And yeah, all three of these books really, they benchmark and profile the human brain. What are the odds that somebody will pick this option? If you have one option or two options, if you introduce a third option, nobody will buy the third option ever, but sales of the second option will go up, that sort of thing. Yeah, just fantastic.
And my follow-on pick for that is I love Divvy. I also love SizeUp, which is a precursor to Divvy. And I use them both even though in reality, you should just get Divvy and then just use the shortcuts from SizeUp and put them into Divvy. I just haven’t gotten around to putting them into Divvy. [Chuckles] Because SizeUp just works great. SizeUp’s a lot more simple. You can only move your windows to the corners or to the half or to the center of the screen. But it’s good. So, that was my picks.
CHUCK: Alright, terrific. I’m going to go ahead and go next. Lately, I listen to a lot of podcasts. I don’t think that’s a surprise to anybody. But lately, there have been a couple of podcasts that I’ve been listening to that are related to Dave Ramsey. And he has one show where he talks about money and getting out of debt. And then they recently launched another podcast based on his latest book called ‘Entreleadership’. And I haven’t read the book, but the podcast itself is just incredible. And they have all these inspiring people on the podcast where they talk about different things.
The latest episode I actually listened to in the car this morning when I was taking my son to school. And they were talking about mission statements, which sounds hoakey. But at the same time, I was really thinking about it and it really made me think about what I’m about and what my company’s about and things like that. And so, I highly recommend that you listen to these and really take them seriously and think about where you’re going with what you’re working on, both personally and professionally. And make up that mission statement or do the other things where you’re investing in yourself. And I’ll put links to both of those up on the podcast, or on the show notes, so that you can get them. But yeah, those are two of the podcasts that I’ve been listening to.
And another pick, and this is one that I know Josh picked in the past, but it’s something that I started watching way before he picked it. And then I realized that it was something my wife would enjoy. And so, we’ve actually been watching ‘The Adventures of Merlin’ on Netflix and really, really been enjoying it. We’re halfway through the second season, which means that we’re about halfway to where we can pick it up when it starts airing again on SyFy in January. So anyway, those are my picks. And we’ll turn it over to Aaron.
AARON: Can I ask a stupid question?
AARON: So, how many of you guys own an iPod?
CHUCK: I do.
DAVID: I do.
JOSH: I do.
AARON: Really? I haven’t owned an iPod since I got an iPhone.
JOSH: I haven’t used mine since I got my iPhone. But I still own it.
AARON: So, are they going to stop? I’m just wondering. Is there going to be some day where the name podcast doesn’t make sense anymore? People will be like, what is a podcast?
DAVID: You know, I was podcasting back when the proper name for it would have been zunecasting.
JOSH: So Aaron, I got to know. What is the verb for entering a phone number into your phone?
AARON: I don’t know, type [inaudible]?
JOSH: You dial a phone number.
AARON: Oh, okay.
DAVID: That was willful.
AARON: That was willful [inaudible].
JOSH: How long since you have actually inserted your finger into a hole in a dial on a rotary phone?
AARON: A very, very long time.
AARON: I was just curious. Alright, alright.
DAVID: No, it’s valid. I genuinely think that mindshare has been taken for podcasting. When I first heard about podcasting, I was like, what the hell is this? It’s this big thing. Everybody’s doing it. Oh. It’s putting an mp3 on a freaking website. That’s all it is.
CHUCK: [Chuckles] Yeah, with an RSS feed. That’s it.
DAVID: That’s it. That’s it. They’ve come up with a name for it. But no, they’ve totally captured mindshare. You don’t buy facial tissues anymore. You buy Kleenex.
DAVID: Even if you don’t buy Kleenex brand, you buy Kleenex.
AARON: Okay, okay, so, picks. First off, George Foreman Grill.
AARON: I love that thing. You can make Panini. You can make steak. [Laughs]
JOSH: It’s really good for hamburgers.
AARON: Yes, great for hamburgers. I love it. [Laughs] And then my nerd picks are, let’s see. So, I’ve been reading. I actually just finished the book ‘Pragmatic Thinking and Learning’. And I really enjoyed that book. So, it’s about, I don’t know, improving how you think and learn. That was really creative. [Laughs] But yeah, I learned to be a more efficient me, which I enjoy.
And let’s see. A couple of programs are, one thing I really like is Fantastical, Fantastical. It’s just a thing for entering stuff into your calendar very easily. One of the important things in this book was well, you need to try to reduce distractions or try and reduce context switches. So, if you’re switching back and forth between your, switching between your editor and your web browser or whatever, is a large context switch and you’ll lose track of what you’re doing. So, a lot of times I want to enter crap into my calendar or into to-do lists and I would switch over to my calendar, enter it in, and then forget what I was doing. I like Fantastical because it’s just a really low overhead to adding things to do. So, that’s my picks.
DAVID: Did you, I keep interjecting. Sorry. But did any of you guys see the study that came out a couple of weeks ago that when you walk through a door, it actually wipes your brain?
AARON: I saw that.
DAVID: You literally, you talk about you go from one room to the other and you forget why you’re there. And it turns out that when you physically change rooms your brain actually does a hardware context switch that says, “New room, new thinking.” And quite often, in that dump, you lose the thing that you were, you forget the reason you left and went to the other room. Yeah, mind-blowing. Literally mind-blowing, or mind-[inaudible].
CHUCK: So, if you jump up, run out of your office and into the bathroom, that’s four context switches before you come back?
DAVID: You know what? I had a really, really lousy meeting with a client. And so, I just went in and out of my front door 73 times and couldn’t remember it after I was done. It was awesome.
DAVID: It’s like memento. You can do it at home.
CHUCK: Alright, Avdi, what are your picks?
AVDI: So, let’s see. My first pick is if you’ve been following Ruby blogs for a few years, you’re probably familiar with Reg Braithwaite’s writings. And a while back, he had this series of writings on combinators in Ruby, various K combinator and Y combinator. And for my money, one of the most interesting and mind-expanding collections of Ruby writing out there, and it also gives you a great intro into a lot of functional programming thought. And he recently and quietly released the whole series as an eBook for the paltry sum of $5. And so, I highly recommend that. I picked it up and I’m going to finally finish reading the ones that I missed.
AARON: What’s the name and where do you get it?
AVDI: It’s called ‘Kestrels, Quirky Birds, and Hopeless Egocentricity’. And it is at leanpub.com/combinators. And that will be in the show notes as well.
AVDI: And I guess I’ll do a Netflix pick as well. My brainless, end of the day show lately has been ‘Burn Notice’. I just started watching that. And it’s like a more violent MacGyver.
AVDI: It’s like a MacGyver where he’s always putting together bombs with duct tape and stuff like that, but he’s not as afraid to actually blow people up. And it’s entertaining. And I just recently realized that one of the main characters is Bruce Campbell. And I don’t know how it took me this long to realize that, but that made it even more awesome. So, ‘Burn Notice’ on Netflix.
CHUCK: Alright. Thanks, Avdi. I just want to thank Aaron again for coming in and being a guest on the podcast.
DAVID: Yes. Yes.
CHUCK: We haven’t had him for a while. And it’s always nice to hear from him. He’s entertaining and intelligent, which is a bonus. So, thanks again for coming, Aaron.
JOSH: And he looks damn good in a suit.
AARON: [Laughs] Thank you. Thank you.
CHUCK: He looks damn good in a beard, too.
DAVID: He’s looking pretty sweet in the Skype picture, let me tell you.
CHUCK: Alright. A few little business items here. You can get the show notes at RubyRogues.com. Also, you can leave us a review in iTunes, which is something that we really appreciate. And finally, we said that we were going to wait until around Christmas, and it’s around Christmas. So, we’ve decided on the book club. We’re going to do ‘Land of Lisp’. We’re going to be talking to…
CHUCK: Conrad, whatever his name is.
JOSH: Barski. Barski.
CHUCK: Barski. I’m so bad with names and I don’t have it right in front of me. So anyway, you can get it at LandOfLisp.com. Is that right?
DAVID: Yes. And there’s a sweet video.
CHUCK: And a comic.
DAVID: But you want to watch the video.
DAVID: I eat parentheses for breakfast. And if my program isn’t done, I eat parentheses for lunch. [Laughs]
CHUCK: Interesting. Anyway, I haven’t seen the video. So yeah, go check that out. Get the book. We’re going to be talking to him around the 22nd of February I think is what we were discussing with him. So, that’s the plan. And then we’re going to do ‘Crafting Rails Applications’ sometime after that in March or April. So, we’ll get you more details about that. But it was really a close race between the two. And ‘Land of Lisp’ just came ahead by a nose. So, that’s the order we’re doing them in. And it should be really interesting to talk to Conrad and then to talk to Jose.
DAVID: To have the Ruby community come out and say we want to hear a book on Smalltalk, now we want to hear a book on Lisp. I love you Ruby community.
DAVID: I love you so much.
CHUCK: Well, I think it says a lot about the community just in the way that we want to challenge the way that we think about our code.
DAVID: Yes. Yeah.
CHUCK: So, it really is an interesting thing.
AVDI: I’m voting for Intercal next.
DAVID: If somebody votes that we do a .NET book, I might change my theme.
DAVID: Then it’s time to stab somebody in the community.
CHUCK: That’s a regression, huh?
DAVID: Yeah, exactly. Learn Lisp because it’ll make you a better programmer in whatever language you’re using. Learn .NET because [sighs]. No.
CHUCK: Alright, well we’re going to wrap this up. And we’ll catch you next week.