CHUCK: We have everyone here?
JAMES: This is everyone we’re going to have, yes.
JOSH: Avdi is two-timing us. [Laughter]
JAMES: That’s accurate. [Laughter]
JOSH: He’s cheating on us with another podcast.
DAVID: I’m not taking him back.
JAMES: [Chuckles] You say that every time, David.
DAVID: I know. I know. I know.
[Hosting and bandwidth provided by the Blue Box Group. Check them out at BlueBox.net.] [This podcast is sponsored by New Relic. To track and optimize your application performance, go to RubyRogues.com/NewRelic.]
[This episode is sponsored by SendGrid, the leader in transactional email and email deliverability. SendGrid helps eliminate the cost and complexity of owning and maintaining your own email infrastructure by handling ISP monitoring, DKIM, SPF, feedback loops, whitelabeling, link customization and more. If you’d rather focus on your business than on scaling your email infrastructure, then visit www.SendGrid.com.]
[This episode is sponsored by Code Climate. Code Climate’s new security monitor alerts you immediately when vulnerabilities are introduced into your Rails app. Sleep better knowing that your data is protected. Try it free at RubyRogues.com/CodeClimate.]
CHUCK: Hey everybody and welcome to episode 130 of the Ruby Rogues podcast. This week on our panel, we have David Brady.
DAVID: Three Ruby Rogues and Aja Hammerly walk into a bar.
CHUCK: James Edward Gray.
JAMES: In this episode, I will be visualized as the invisible [study].
CHUCK: Josh Susser.
JOSH: Apparently this morning, it’s Josh versus the Kombucha.
CHUCK: I’m Charles Max Wood from DevChat.TV. And all I have to say is I’ve got stickers with all of the album art from the different shows that we do. So if you want one, find me at RubyConf. And we also have a special guest and that is Aja Hammerly.
AJA: Good morning from cloudy Seattle.
JOSH: Hey, good morning.
CHUCK: Is there any other kind?
JAMES: It’s cloudy here, too.
AJA: Occasionally, it gets sunny, but just not very often.
CHUCK: Very green up there, though. It’s beautiful. So, do you want to introduce yourself since you haven’t been on the show before?
AJA: Oh, sure. So, I am a developer and I have a deep desire to get more people to use data when making decisions about their sites and about their projects. And I did a talk at GoGaRuCo this year on data visualization, specifically really quick and easy back of the envelope style of data visualization.
JAMES: Whoa! I’m about to rage quit this call.
CHUCK: Can you do that to the envelope data visualization?
AJA: Yeah, you can do. Well yeah, of course you can. You can draw easy pictures that kind of just generally show what’s going on. The point is that data visualizations don’t always have to be really high quality and shiny with lots of zooming animations. Sometimes, all you need to do is just see a basic picture of what’s going on in your site or with your project. JOSH: Who let Edward Tufte on this call? [Chuckles]
DAVID: Oh, she actually said that in her talk. She said, “This is not that talk.” And her slide was a graph.
JAMES: That’s one of my favorite parts of that talk.
JOSH: [Chuckles] Yeah. That was the Napoleonic march one.
AJA: Yes, it was.
AJA: I’m not a designer and I very much respect Tufte and all the people who can make things that are almost that cool, but that is not me. And I can still do data visualization and that’s kind of my point.
JAMES: Yeah, there was this super complicated slide with all this crazy stuff on it and he was like, “This is not that talk.” It was great.
JOSH: Okay. So, if you can summarize your talk in 30 seconds, what would it be?
AJA: Yeah, It would probably have to be a picture. It would probably be that wonderful slide I have of the fourth-grade class crayon drawing of who likes Justin Beiber and who doesn’t?
JOSH: [Chuckles] Okay.
JAMES: Another awesome picture. What I really liked in the talk is you actually broke it down really well from, you had a great section at the beginning on ‘Why would you do this?’ which I thought was really great. And then you had lots of ‘How could you go about this and when’. Maybe you kind of snuck a little ‘when’ in there. I thought it was interesting how many axes you managed to hit on. So, why don’t you give us the ‘Why would you do this?’ spiel? Because it was great.
AJA: So, I think that as engineers or if you see Ryan’s talk from Cascadia, as craftsman, we tend to be very specialized. And things seemed blatantly obvious to us aren’t obvious to lots of other people, including the people who are making decisions about what is important and what is not. And I’ve found that at all of the jobs I’ve had, it’s much easier to convince someone that the path they’re going on is incorrect or that we need to do some optimization or spend some time cleaning up code or any other number of things if I can show them a picture of the benefits, how it will affect them, or I can draw a data structure or draw an architecture diagram. It’s much easier if I can show someone a picture, I can convince them. I can explain things to them a lot easier. So, that’s why I think we should use these tools more often as opposed to just spouting numbers at each other. I also have worked on a number of teams where we had people from all over the world. And I found that pictures were less likely to be misinterpreted than language a lot of the time. Even when my teammates spoke perfect English, but there were still idioms and assumptions that we make based on our US-centric view that we didn’t realize we were making that assumption and they didn’t realize that they were working from a different set of assumptions. But when we all looked at a picture together, it was really easy to see what the point was, what we needed to do next, kind of just general diagram of the situation. And I also found that pictures help diffuse panic, especially when things are just going really bad on them. I’ve only worked in the web for the most part and when things are going really bad on the site, everything seems to be falling, a picture and New Relic graphs have been great for that in my experience. It just helps people see what the actual problem is as opposed to taking wild guesses, because they really want to solve the problem and they haven’t spent the time to look at the data because they’re afraid to take the time to crunch the numbers.
JAMES: I think that step of actually taking the time to look at some numbers is a really good thing. I don’t know if everyone here saw, Mavericks is out now, Apple’s new operating system. And my favorite part about any new operating system release from Apple is the 24 to 25 page write-up that John Siracusa does on Ars Technica. And in this particular one, he talked a lot about the things they did in Mavericks to speed it up or to increase the battery life as well, energy savings and stuff like that. And one of the killer sections of that review, I’ll link to in the show notes, if you only read one page, read the page on memory compression. Because it’s basically completely about how the things we think about computer performance are so wrong. And it’s really interesting. So, Apple in Mavericks, when it runs out of memory, it will start compressing what’s in RAM. And when you think about that from a performance standpoint, it almost sounds horrible. It runs out of memory so it’s needing to allocate memory so this is a time-sensitive situation and the computer stops and compresses a bunch of data using a compression algorithm and puts it back in RAM to make room to put the stuff in. And now whenever it has to get that data back out, it’s going to have to inflate it, and blah, blah, blah, all this. Turns out that drastically speeds up the computer. And the reason is that if it’s running out of RAM, your next choice is swap. And swap, even on a flash drive, is hundreds of times slower than RAM access. So even though you’ve got to stop and compress, it’s still faster than going to swap. It’s a very interesting read. It’s really good. And I think what you were talking about, how it helps to just stop and measure because sometimes, we think we know what’s going on, but when you actually look at the numbers you may see that doing this more work, like Apple’s doing, teaches you something you didn’t know before.
DAVID: I need to do some analytics like that on my current machine. When I switched away from a Macintosh last October, I ended up buying a new PC. And of course, I’m used to Mac and the Apple tax, Macintosh prices, so when I went into the computer store and she asked me what my budget was and I said, “Well, I’d like to say under $2000,” she just laughed at me and she said, “I can’t fill that budget.” And I’m like, “Oh, so this is my playground now. I’ll just take the best of everything.” So, she built me this gigantic monster PC that I’m running Linux on. And it’s the first computer since I was in high school that I’ve maxed out all of the RAM on the motherboard. So, I’ve actually got 32 gigabytes of RAM in my Linux box and I no longer have the swapping problem. We’re just unconsciously used to things getting laggy and slow because we’re swapping to the page file or to the swap on disk. I never have that. But I have other weird things that happen. Like when you try to garbage collect 16 gigs of memory at once, everything goes. All your bets are off. And so, I just realized that I need to take measurements on this to find out why is my computer still slow. There was a point in there, but I’m not sure what.
JOSH: Can you talk about how you’re visualizing that data? [Chuckles]
DAVID: Well, yeah. That’s exactly what it is, right? It’s useless to just run DTrace and just spit it out to a log file. I’m not going to be able to do anything useful with that. But I assumed that I’m losing performance in garbage collection. But that’s just an assumption. If I’m not going to base it on data, then, what did Aja say in her talk? Maybe I should use Java. That should be the solution for everything. [Chuckles]
JAMES: Nice. Yeah, it’s surprising just how sometimes seeing the data. A lot of times, we look at data thinking one thing is the most significant. But once we see it, we can tweak that section all day long and it’s still not going to move the needle because that’s not where the big problem is. And just having those numbers in front of you, and way better in a graph than just numbers, because you can actually see the spikes and things like that.
CHUCK: Yeah, one thing that I’ve seen though, in a lot of cases, is that once I start gathering the information to visualize it, I start having the Aha moment. So just seeing the numbers and the shock value of, “Oh, that’s different from what I expected,” and start challenging my preconceptions before I even get to the point where I can draw a picture of it and say, “So this is the big picture and this is what I expected overall.” So, that’s always interesting to me when I start doing that and start making discoveries right away.
AJA: Yeah, I’ve totally been there. Looking at the numbers is often enough for me. And what I’ve found at some of my previous jobs is that while looking at the numbers is often enough for me, the first thing that happens when I go show someone the numbers, they’re like, “What are you trying to say here?” And that’s where the data visualization stuff really kicked in was that I gave them something that they can digest in a couple of seconds as opposed to me having to start throwing numbers and words at them for a while and getting them very, very confused.
DAVID: Yeah. Go ahead.
JOSH: Well, I was going to say, one of the great examples of the power of visualization is Feynman diagrams. Are people familiar with those?
DAVID: I’m familiar with them, but I never learned how to read them. They’re beautiful and weird.
CHUCK: [Laughter] I don’t know. I don’t know what they are.
JOSH: So, a Feynman diagram was a visual tool developed by Richard Feynman. And it’s basically a way to visualize quantum interactions, like between electrons and positrons and quarks, and graph them in space-time and then figure out what the actual interactions between the particles is. And being able to visualize things this way makes it so much easier to figure out what’s going on. They were a pretty significant breakthrough.
DAVID: Yeah. And they’re squiggly lines and springs. They look great.
JOSH: [Chuckles] Yeah. I was actually reading recently that somebody came up with a more powerful thing.
DAVID: Oh, cool.
JOSH: That is some ridiculously high-dimensional picture. But anyway, the point was that here’s a visual representation of something that if you look at the Wikipedia page for it, it’s just tons and tons of equations, or you can look at the cute little picture.
DAVID: We talk about how a picture is worth a thousand words. But more importantly, a picture is worth a thousand words all at once. A bit from Aja’s talk that I absolutely loved, it was a joke and it got a big laugh. Aja, you talked about how the site’s running slow so the engineers go in and they fight. And the first person shouts out, “We should use Java!” And you go in later and you bombard your manager who’s this Labrador Retriever looking at you puzzled and you just said, what did you say? You puked numbers?
AJA: Barfing numbers, yeah. And I’m prone to doing that based on a real experience.
DAVID: Yeah. There’s a sales adage that says a confused mind always says ‘no’. And what I loved was how you tied that in, that you go in, you barf numbers at your manager, and the manager says, “So, we should use Java?” And you realized, “No, you’ve completely missed my point.” And yeah, when you can bring in a picture and say, “Here’s your problem,” they can go, “Oh, okay. We need this.”
CHUCK: Yup. You can also use it for when they tell you ‘no’ and then you come back and say, “You remember I showed you that graph that was rising exponentially? And now, you’re getting mad at me because we’re behind?” [Chuckles]
CHUCK: Not that I’m speaking from experience or anything. [Chuckles]
DAVID: I’ve had managers that didn’t want to add extra servers and I’ve been able to go to them and say, “I can’t make the software go any faster. The problem’s on the network at this point and the only thing that can speed things up is more servers.” Or gone and said, “The problem is actually the network wire to the database and no amount of adding servers is going to solve that problem.”
JAMES: So, I think we’ve kind of sold on the why. Now, why don’t we take Aja’s talk a little out of order? At the end of the talk, you gave some general techniques and tips for data collection, getting the data that you’re going to visualize. You want to tell us what your ideas were on that?
AJA: I tend to use Ruby because that’s my most comfortable programming language. And if the data is in text files, I fall back on string split and regexes. And those two tools are fantastic and they work really well most of the time. But frequently, I’ve had to pull data from other sources, often web pages oddly enough. So, I’m pulling data out of web pages to graph something to a diagram related to a competitor’s product or something. And for that, I really like the Mechanize hammer. And I really like Nokogiri for parsing out HTML. And both of those tools have a bit of a learning curve. But once you learn them, they really do start feeling like your everything tool for things related to data extraction. And I was really fortunate that at GoGaRuCo, the talk right after me was on Nokogiri and Mechanize and how those tools work. So, I tried to use the simplest thing that could possibly work because I don’t want to spend too much time on the extraction because I’m really curious to see the data and make sort of pretty pictures. But most of the time, yeah, Nokogiri and Mechanize are my tools of choice for web-based stuff.
JOSH: Cool. And what about — in the talk, you described several different libraries that you use for doing visualization. And a lot of that comes down to, I guess, what’s the browser technology you’re using. Are you using SVGs or what have you?
AJA: I tried to focus in the talk on technologies for quick and dirty diagrams. I specifically did not talk about tools like D3 and others that can make the types of diagrams you want to show your customers. Because my experience has been that if the developers get their hands on those tools, magically graphs start showing up at all sorts of crazy places and then their designers freak out.
JAMES: You have to use them.
CHUCK: So, most Graphviz outputs that I’ve seen are typically more entity diagrams or workflow diagrams as opposed to data visualization stuff, which I tend to think of more along the lines of graphs and charts. Does it do the data stuff?
AJA: It doesn’t but I think a lot of people don’t realize that they have dependency trees hidden in their code. I worked at a company that did education and we had a ginormous dependency tree hidden in our curriculum that we deliver via the web. And we’ve ended up visualizing that with Graphviz internally for testing purposes so that we could see how the different lessons connected to each other. The project I’m on right now, we have some crazy — between inter-server interactions and we could easily use something like Graphviz to show how the data transfers from one end to the other of the server, or something like that. I think people don’t realize, they think visualization and they think, “I’m going to make a pretty line chart.” And they don’t realize that there’s other kinds of visualizations that can help them out.
JAMES: Yeah, I want to kind of stress that a little bit. In profiling, one of the things we often do is show the timings of various things and stuff which, obviously, is very helpful. But another thing that can be as helpful a lot of the times is just seeing the call stack. If you can just see the call stack and like, “Whoa! That thing gets called every single time through here,” then a lot of times, that’s almost as effective as seeing the timings. And Graphviz is totally capable of graphing that call stack.
DAVID: I have a Graphviz question for you, which is that my one annoyance from it is that it likes to lay things out on its own and I haven’t figured out a good way to tell it to stop that. Usually, on a small graph it’s fine, I’m okay with it. But in your talk, you talked about graphing out the dependency chain of your curriculum and you could look at your chart and you could clearly see that here’s the 100-level courses, here’s the 200-level, here’s the 300, here’s the 400-level courses. But if you had a 400-level course that only had two dependencies that were in the 200-levels, Graphviz would probably try and put that in with the 300-level courses. Can you manipulate the layout when you are graphing something and say I want these grouped together?
AJA: You can. The technique for doing that is clustering. I did a previous talk at RubyConf a couple of years ago that talks about how to do clustering because I’ve used it in some of my other visualizations. It didn’t make the talk for GoGaRuCo because I had a lot of other stuff I wanted to say. But if you look up my other talk on Confreaks, I’ll post a link in the show notes that shows how to do clustering. I will say that if you’re tempted to use clustering, you should think twice about it.
AJA: Generally, if Graphviz is laying something out a certain way, it’s telling you that perhaps your assumptions about this case, that 400-level course, may not be wrong. And maybe it actually is a 300-level course that’s been mis-numbered.
AJA: So, the layout may be trying to tell you something. And frequently, when I’ve done these huge layouts with hundreds and hundreds of nodes, it actually does tell me something, that there are these specific things that, “That doesn’t belong there. Oh wait, maybe it actually does.”
JAMES: So, I have to confess that I learned something new from Aja’s talk and that’s the existence of the graph gem. A testament to Graphviz, but the DOT format that it reads, the DOT file format, is very simple. And I’ve used it a lot in the past and I just stick things into it. But then, I find it’s always simple to get those first two nodes in there. And then, by the time I’m throwing 30 nodes in there, my codes usually getting pretty ugly. And I’ve always just done it manually and did not realize that we have this awesome graph gem in Ruby that basically just paints a pretty DSL over the top of this file format. So, it’s really cool. And if you don’t know that, you should check it out because it is super handy.
AJA: One of the things I really like about the gem is that it’s really small. And I know people are all saying, “You should read code. Read other people’s code to get better at programming.” I think the graph gem is a really great example of that because it shows how to make a DSL but it’s a really simple DSL and the code itself is really easy to read. And so, in addition to being handy, it’s also I think a great example of how to make a very small gem to solve a very specific problem.
CHUCK: I want to change topics a little bit. One of the issues that I’ve seen with collecting data and even visualizing it is that when it’s all said and done and you’ve put it up in a way that makes it easy to look at and draw conclusions from, the people looking at it draw different conclusions from it. So, is there a good or standard way of interpreting the data once you’ve visualized it?
AJA: I don’t know. I’ve run into that problem. And I think for me, a lot of the times, the picture, the graph, the chart is just the starting point for the conversation. And I found that if people are drawing different conclusions than I am, either I’ve missed something or perhaps the visualization isn’t clear enough or perhaps we’re dealing with different underlying assumptions. And sadly, words are one of the only ways to break down those underlying assumptions and start figuring out how people are interpreting data based on their past experience. I like to use colors a lot to highlight specific sections and sometimes I think that can draw people’s attention to the part that I’m really focused on. But yeah, that totally happens. I have had many a discussion about what a particular New Relic graph was saying about the progress of our site. I’m saying that, “No, no, it means that everything is fine.” And someone’s saying, “No, we need to put more indexes in.” But it’s just how you interpret the data. And I think that is a problem.
JAMES: You’ve politely explained to them and of course, they’re allowed to have their own opinion. They’re wrong, but they’re allowed to have their own opinion. [Laughter] CHUCK: Those darn business people.
JAMES: You had a great section in your talk, Aja, about colors, including this cool site you showed off. Do you want to talk about that a little bit?
AJA: So, I am design-challenged. When I was doing the first version of this talk a couple of years ago, I found Colorbrewer2.org I believe. And it is a fantastic site for showing different color schemes and how you can use colors to indicate meaning, which I think is something that a lot of people miss. Because those graphs, those charts we see on election night of a map of your state or this nation with varying degrees of blue and purple or sometimes just blue and red and the spectrum, are showing that there is a spectrum and that there are different places that are different ends of that spectrum and a lot of places in the middle. Whereas you can use different colors to show that these different categories are completely unrelated. There’s no progression. There’s no spectrum here. Or you can show different colors, like shades of blue, to show that there’s varying degrees of impact that something has or this population has more of trait X and this population has less. And Colorbrewer2 makes you state your intention for your color scheme and then helps you narrow it down based on how many colors you need. Do you need something that’s colorblind accessible? Do you need something that’s photocopier-friendly? And then it outputs, it gives you a list of these brewer color schemes, which are supported by Graphviz and some other tools. And then you can choose that and know that your colors are communicating what you want them to, rather than just picking pretty colors at random because you like them, which is what I tend to do without help.
JAMES: Yeah, I thought that was really cool. It never even occurred to me to think about it at an intent level to make these. Oh, if I want to show something on a spectrum, then it should be these colors that naturally blend from one to the other and stuff. So it’s really cool. It’s a neat tool. You should play around with it. It’s pretty cool.
JOSH: Do you have other tricks for things like showing error bars? Is there some way you like to do that?
AJA: I have a science background, so I just fall back on regular scientific little tiny bars with straight lines or little lines on the ends of them. I think there are probably better ways to do it. I’m still learning on this data visualization scheme, this data visualization game. I keep trying to get better at it. But I just fall back on science because that’s what I know.
JOSH: [Chuckles] Okay. So, you use the standard error bars.
JOSH: I’ve seen some graphing packages that have error bars just built in as one of the tools.
AJA: Yeah, I don’t know if Highcharts does that. I know that there are others that do. And when I fall back on the basic charting tools that are built into Numbers or Excel on fairly regular basis for internal stuff, it doesn’t need to be heavily-scripted. But I’d be surprised if Highcharts doesn’t support that out of the box.
DAVID: It supports a candlestick chart. Is that the same kind of error bar?
JOSH: Oh, that’s a pretty standard one.
JAMES: Josh’s talking about error bars got me thinking another interesting piece in that Ars Technica review. At one point, as they were talking about how CPU usage is crazy, that we ask the computers to start and stop all the time. And CPU usage, one of the new things Mavericks does is to put those chunks together so there’s less starting and stopping. And one of the ways they visualized that which was neat, was this particular chart that is a bar chart when utilization kicks in, but then it almost has this line graph overlayed over the top of it so that you can understand the ramp up and ramp down penalty. And it just stuck in my head as that was a really neat way to convey the message. And that’s, I think, what we’re talking about here. That if you can find the right way to show someone the data so they look at it and they have their Aha moment, I think that’s really the goal you’re aiming for. It’s very helpful. JOSH: Hey Aja, can you talk about some of the non-technical decision points that you make around the tools? I know that I’ve looked at Highcharts in the past and I seem to recall that the licensing terms or the pricing around it wasn’t going to be compatible with the project I was working on. Is that something that you think about when you do this? Do you look at licenses and figure out what’s going to work with your project that way?
DAVID: Yeah, true.
AJA: And I know that
JOSH: Yeah, go ahead.
AJA: Go ahead, Josh.
JOSH: No, I was going to change subjects, so just keep
AJA: No, it’s
JOSH: Oh, okay. So I have a question about the API-level stuff. The last time I was doing much graphing, I found that it seemed like there were two camps of how to ship the data around. One was, “Here’s a list of all of the x-coordinates and then here’s a list of all the y-coordinates.” And then something in the graphing package would figure out, “Okay, now I can draw something at the intersection of the x and the y.” And then there was the one to say, “Oh, here’s a list of all of my x-y pairs.” And it seemed like one or the other was good for particular kinds of visualizations. I haven’t dug into the Highcharts stuff. I’ve only taken a glance at it. Are there options for doing one or the other kind of data series? Or do you have a preference for that?
AJA: I like laying out pairs of x and y and letting the library figure out what my axis [screen] should look like. Because I don’t want to necessarily make those decisions. Highcharts mostly does the data pairs, although you can specify the range on the axes if you care about that, otherwise it will pick intelligently from your data set. I prefer that. But I see the value in laying things out the other way. I like getting the data and then not having to worry about too much about the layout once I have the data. And I can generally put things together into a list of pairs or several lists of pairs pretty easily with code. And then I let the API take over it. I let the library take over at that point and do the drawing and the layout and everything else for me, mostly because I’m lazy and I like being lazy and having pretty pictures show up anyway.
JOSH: Okay. So what’s the laziest way that you found to just put a useful picture up?
AJA: So the basic, [at my talk] I give the absolute minimum Highcharts example and it is very, very simple. There’s very little data there. It’s a set of categories and a set of data points for those categories and that’s it. And I can make two arrays really quickly in Ruby and I don’t have to do anything after that. There’s a little bit of boiler plate, but that’s copy and paste and it comes out. I also will admit that I have Mac. I use Numbers and generating a comma-separated file and opening it up in Numbers and then having that just draw the chart for me, really, really easy.
JAMES: Yeah, that’s a good point, I think. We actually had a conversation about this off the air, myself and Josh and Avdi. And Avdi was talking about how at one point in his career he had been forced to learn spreadsheets, actually learn them and do something with them. And how he felt that was quite the cheater’s tool, knowing how to use a spreadsheet, that they’re very good at things like this. And given a couple of sets of numbers, it’s typically a few clicks to bring up a pretty graph of that data. And Avdi was talking in that conversation about how a lot of times as programmers, we see it and we think, “Oh, I can write a script to do that,” and you probably can. But it may take you 30 minutes whereas in something like Numbers or whatever, charting is just what it does. And it does it very quickly and very well. So it’s a good cheat, especially when we’re talking about quick and dirty visualization.
JOSH: Okay. What about printing? Is there something that you like that has good support for printing?
AJA: So Highcharts does have support for printing. I’ve never used it because I’m frequently emailing links around. I’ll throw an HTML page somewhere where everyone can access it and send out a link to it. Once you’ve created the Graphviz file, the DOT file with the graph gem, those print just fine. And there are ways to set the size of your graphable areas so that it can print on a specific type of paper. I had a graph that I generated that we needed a plotter to print it and it turns out that the only plotter I could get access to was three feet wide. And so I had to set the boundaries so that I can print it at three feet wide to however long I needed it be. That ended up being about eight feet long. It was awesome.
AJA: It was really handy. We posted it on the wall. That’s another time when I fall back on really basic spreadsheeting tools, is all of the common office type frameworks. So I work Keynote. Oh sorry, I work OpenOffice and Microsoft Office have their very basic spreadsheeting tools that print just fine. They don’t give you all the help with colors and things that some of the better tools do, but you can use what you learn from something like Colorbrewer and apply it there. And those print great. I do try to avoid printing a lot of the time, especially when I have data that might be misinterpreted so that I can be there when the data is shared or I can see who’s accessed it, make sure it’s not getting spread around too far. But there’s lots of value. I understand that some people like that.
JOSH: Okay. Do you have a different approach when you’re, you did a whole talk about quick and dirty visualizations. And those are great for, “Okay, I got something I want to understand that I want to explore this numerical space better.” But there’s also the visualizations that I think we do a lot in our work of wanting to understand something on an ongoing basis. You talked about New Relic. It’s like, “Okay, what’s my performance report for the day?” or “What’s my user conversion rates?” and there are graphs or visualizations that you want to look at basically all the time or every day or every week. And do you find yourself approaching those things that you want to see over and over again differently than you do the things that are quick and dirty that you only want to look at once or twice?
AJA: Sometimes. So if it’s something that I’m going to want to look at a lot that requires continual uptime, requires analysis, so performance data would be good, error monitoring, stuff having to do with your logging on a website, I often make that someone else’s problem. Because very few companies I know have the time or want to roll their own when there’s a tool that they can pay a moderate amount for and get everything they need. Because then they can have their development stuff focused on something related to their product phase. But a lot of projects have custom data visualization needs. When I worked at the education startup, we needed to be able to visualize our curriculum and visualize where a given student was in our curriculum or how groups of students were progressing through the curriculum. And we weren’t going to find an off-the-shelf solution for that. But we were also mostly going to use those things internally. So we hooked up the quick and dirty scripts that were written to the source control system so that when the curriculum changed, those scripts automatically regenerated all of the visualizations that we had used. So at any given time, the pictures were available at a specific file share and we can go get them. And it was happening regularly and we consulted them on a regular basis. But we didn’t have to do a lot of work continuously to keep them going. You can do the same thing with a cron job if you have something that you know you need to generate that’s constant to your workspace and then you can edit every night or every two hours. So as long as you’re not going to be showing it off to lots of other people who will care that it’s not pretty, your quick and dirty techniques can be used for a really long time. And that’s I think something that a lot of people miss. They think these data visualizations have to be epic and beautiful. And they don’t have to be as long as they get the point across.
JOSH: Okay. Have you ever looked at if I want to graph this thing over time, seeing what the axes ranges are so that they can be consistent over time even if your data isn’t fully populating [the raft]? So that things compare over time, apples to apples.
AJA: So with Highcharts, you can cite your range and make that happen. And that’s pretty straightforward. I’ve got a personal project that I’m graphing weightlifting data right now. And we can set the weight range on the weight axis because time’s on the x and weight’s on the y. And you can set it so it’s consistent so you can compare it over time. With Graphviz, that’s a little harder. That was one of the things that got us, when I was doing the curriculum graphing, was that we would add a bunch of curriculum to one section and the entire graph would flip over the y axis or something. So you couldn’t necessarily compare last week’s graph to this week’s graph to see what you added. And I haven’t figured out a way to pass that in Graphviz but the benefit we got from it made it worth it anyway.
JAMES: You can diff the DOT file. [Chuckles]
AJA: That’s not particularly good for showing the sales team what you’ve added curriculum-wise. [Chuckles]
JAMES: Yes, I hear you.
JOSH: Aja, I just love that you have answers for all of my harebrained questions here.
JAMES: Stop encouraging.
CHUCK: I have another question about collecting the data. So you mentioned that you like to use Ruby. You mentioned the Graphviz format. I guess what I’m wondering is I know a lot of people who collect analytics on their application, like New Relic except they customize it themselves. If you were going to do that, what kind of tools do you use for that? Do you use some kind of, most of them I think use some kind of NoSQL database, do you do that? Do you just put it into a regular relational database? Or do you collect it in a different way?
AJA: So I’ve never done this successfully. I’ve worked on several projects that I’ve tried to do it and we’ve always found that the dataset gets really big really fast and we don’t get the time from the management to handle those cases and deal with the ever-growing datasets that produce even bigger visualizations. It always seems like the internal tools, the internal analytics, are the least priority of a lot of people. So I don’t know what the best solution is. I know that I have used a relational database for it and that database got very big and very slow very quickly. And it also was a problem when we changed what we were collecting, because some records were sparse in some places and other records weren’t. I’d love to see what other people have done, if people have done this successfully. It seems that I always fall back on New Relic or another common tool is good enough. Regular log processing is good enough, even though we’d like something more.
CHUCK: Yeah, that makes sense.
JAMES: Yes. It’s a very specialized problem that requires very specialized tools. I think what you said originally is make it somebody else’s problem. It’s extremely wise. [Chuckles]
AJA: Oh, my experience is that if it isn’t somebody else’s problem, it doesn’t happen. Because the stuff that isn’t visible to customers, little things go really, really wrong, never get prioritized or rarely get prioritized.
JAMES: Good call. Alright, so data visualization. What else? What have we missed? Anything else we need to cover?
AJA: I think you guys got most of it. I think the big message from my talk is just use the data you have and use pictures to make the data you have more digestible. Convince people the data is important.
JOSH: Well, okay. I have a whole thorny issue here. And that’s animation.
AJA: Yes. Animation.
JAMES: Ah, yeah.
JOSH: So we haven’t talked about that at all. You didn’t really into it in your talk at GoGaRuCo either, but that’s a big thing in data visualization, is showing things changing over time. Do you do this? Do you have a preferred tool for it?
JAMES: I’ve often used a trick similar to Aja where I just throw a bunch of images in a [direct ring], stamping them out, 01, 02, that kind of thing. And then there’s a tool on the mat called GraphicConverter which is basically like the Swiss Army Knife of, I swear, sometimes it understands image formats that I don’t think have been invented yet. [Chuckles]
JAMES: It’s amazing what it knows. And you can just point it at a folder and images like that, that have these numbered suffixes or whatever and it’ll just grab them and make a movie out of them. It’s a really cheap trick, but it works good.
DAVID: For the command line freaks out there, ImageMagick is amazing. The convert command can do pretty much anything. You can stitch images together into a gif file. You can explode a gif file out into individual frames, et cetera. That’s one of the tools that I like to use as well.
AJA: I think they’re really cool and really powerful. I don’t know them particularly, tools like D3, particularly well. But I’ve seen some amazing stuff done with it. And I want to learn more. When you only have 30 minutes to talk, you have to cut.
JOSH: Oh yeah.
AJA: You have to pick and choose what you’re going to actually talk about. And it was on my list of to-do before those [call], to learn more about D3 so I could sound more intelligent about it. But it didn’t happen. [Chuckles]
JOSH: I’ve played around with D3 a little bit. And I like approach. It feels a lot like functional programming. It’s built up that way. You’ve got a bunch of little filters that transform things and you just build up a pipeline that you dump your data into at one end and out the other end you get this pretty picture. Sounds great, huh? [Chuckles]
AJA: Sounds fantastic.
AJA: It’s on the to-do list.
JAMES: The page for it, just the intro page for it, is very interesting and compelling.
JOSH: Yeah. It’s pretty impressive. I was really upset though that the project I wanted to use D3 on last year, I couldn’t because it didn’t actually support browsers far enough back in time.
JAMES: Yeah. I think that’s a common problem with some of the fancier stuff [inaudible] the compatibility and Aja hits on that a lot. It’s one of the things that make Highcharts worth it. We had to buy it on a site once, which is silly. The client bought the $90 price tag I think it is for a single site. It’s like, “Yeah, but you’re asking us to support browsers back since the dawn of time.” And this one fee gets you that. That’s beyond reasonable.
DAVID: Yeah. For people still supporting IE 6, BJ Clark, RobotDeathSquad on Twitter, has the best quote for that which is, “Still support IE 6. People that use IE 6 are used to seeing the internet broken.”
JAMES: That’s got to be getting true with big people like Google dropping them.
DAVID: And he said that in 2010. So move on, people.
JAMES: Sure, sure.
CHUCK: The one thing that I want to ask about is behavior visualization. We talked a little bit about this with Graphviz with the workflows and things like that. I guess what my question is, is do you find that it’s much different visualizing the path through some code or some process or workflow in your software as opposed to visualizing a collection of data, compiling that into a chart or graph?
AJA: So one thing I found is that when you’re trying to visualize the path through some code is that you don’t have nearly as much data so it’s a lot easier to just do it by hand. Whereas if you’re looking at hundreds of data points that you need to draw. So I find myself falling back to the whiteboard or a piece of paper sitting next to my desk a lot more often. I wish someone would make a really cool tool for drawing up sequence diagrams because I would totally use it all the time to see how data is flowing through a piece of code. Sometimes that’s what the biggest problem is. Okay, what happens next? How do I get back from here? And it takes time to dig out. It’d be cool if someone could make a tool that can analyze code and do that. I find that I fall back on the automated tools a lot more often when I have a lot of data and doing it by hand would be really time-consuming or I’d be really likely to make a mistake because there’s just so much data.
JAMES: I think one of the things I once said about that is in recent years, I’ve actually mentioned this in a few of my recent talks, is just finally learning and understanding state machines and stuff has been super valuable to me. We use state machines in everything and then we have no idea that we’re using them. The web is all state machines everywhere. The way we do Active Record models with a published at column and something, you’re defining a state machine there. And things like that and becoming aware of them and them and then tools that draw them particularly well, these [inaudible]. [Steve Podeneck] talks in his book ‘Designing Hypermedia APIs’ about how it’s actually one of the steps in designing a hypermedia API is basically draw the state machine. [Chuckles] Because that’s what it is, interactions with an API and stuff. So I find that super valuable. My go-to tool for that which is automated but one I do it manually is OmniGraffle. Boy, it’s just awesome at that kind of stuff because you can just nest and tap it out and it’s really fast. Put them together and they work awesome.
JOSH: Okay, so back to basics just real quick. There are a lot of different kinds of graphs that you can use to visualize things, even just down to, “Oh, am I going to use a log scale versus a linear scale,” kind of thing. And is there some way that you approach this kind [of thing]? How do you choose a strategy for presenting information and figuring out whether you want to use a line graph or bar charts or what have you?
AJA: One of the advantages of using tools to do it is that you can try a couple of different ways of showing something, especially with charting, and see which one communicates what you’re trying to say the best. So maybe you want to use a log scale if you really, really want to emphasize that exponential growth. But if you’re not so worried about that and you want everything to fit on a single sheet of paper, maybe that’s the time to use the log scale when you want to show that exponential growth [inaudible]. A lot of this comes back to the stuff we learned in school, learning about different kinds of charts, is that you use pie charts when you want to show how a whole is made up of different parts. You use line charts when you want to show a single data point over time and the x-axis is nearly always your time variable, whatever that is. And the nice thing about automated tools is you can try a couple of different things and see which one looks the best. I almost never use bar charts because I find them confusing and I find the whole area problems that if you search about how charts lie, you can read about how bar charts and area. The area is not necessarily representative of the ratio between the values a lot of the time. And they can be really misleading that way. So I try to avoid them. And you don’t tend to see them used very often in experimental sciences for the same reasons.
JOSH: Yeah. Yeah, I guess the big problem with us for charting is how you deal with zero. If you’re plotting bunch of values that all fall between 100 and 101, then your distance from zero is 100 times the range of what you’re charting. So you can never really show that. And the differences between point A and point B might be only a fraction of a percent. But they can look huge depending on how you choose your axes. So that’s my favorite way to lie with graphs, is choosing your axes.
AJA: Mine too.
JAMES: Josh, you’re not supposed to be giving those tips.
AJA: I actually think there’s some value there.
CHUCK: I was going to, oh go ahead.
AJA: Learning how to lie and make your point. The whole point of visualization a lot of the time is to communicate a specific point. And playing with variables to see how you can make the point you’re trying to make more obvious I think is a valuable skill. As long as you realize that you are perhaps manipulating the perceptions of your audience. I also think it just makes people better consumers of information generally, if they understand how those things work.
CHUCK: Yeah, I was going to say to Josh, that was spoken like a true choose your own opposing ideology.
JOSH: Yeah, I guess so.
JAMES: On this tip of trying several different stuff, I will say Highcharts is pretty cool for that because there are these different classes and they all take a bunch of parameters but they most of the time take very similar parameters, enough that if you’re sticking to the basics, it’s still the same. And so a lot of the times, you can just change the name of that constructor call and boom, you’ve got a totally different graph. And you may have to tweak a few things, but it really is impressive for just switching various types.
JOSH: Yeah. Do you have a favorite fun way of graphing something? I remember in high school, Valentine’s Day, they had the polar coordinates equation that drew a cute little valentine heart. Do you find yourselves ever doing whimsical graphing or something that’s more of an artistic expression?
AJA: I don’t. I tend to do my artistic expression with non-technical tools.
AJA: But I have had fun in the past. When someone says something completely inane in a meeting, going back to my desk and whipping up a very quick representation of why they’re wrong or why their assumptions are very wrong and just sliding it across the desk to them and saying, “I understand that you think that, but you are wrong and here is why.” [Laughter]
DAVID: This is why you’re dumb.
JOSH: You are so wrong that I was able to create a graph of why you are wrong.
CHUCK: There you go.
JAMES: I want the graph plugin that creates the crayon drawings like Aja had in her talk.
AJA: That would be so cool.
JAMES: Yeah, it would be great.
CHUCK: Yes, sign me up.
AJA: Anything to make people not show the customer what you’re creating.
JOSH: I think if you just use old ink cartridges on the printer, it has the same effect. [Chuckles]
JAMES: Oh, jeez. Who knew?
AJA: I’m sure someone’s done CSS that makes things, is the visual equivalent of Comic Sans, graphic equivalent of Comic Sans.
DAVID: Balsamiq is a UI mockup tool that is deliberately intended to look like sketches on a napkin so that when you do show it to the customer, they don’t think it’s done. [Chuckles]
CHUCK: Yup. Yeah, I’ve seen several visualization tools like that for Keynote and things like that as well. But it does give it some character and it makes it visually more interesting to look at.
JOSH: Well, I think if you’re using rough scribbled lines that that communicates maybe, “Oh, this is rough data.” So I think if you can present the data and use this out of band communication cues that that’s helpful. I’d love that they’ll have their crayon graphing package.
CHUCK: Yeah, no kidding.
CHUCK: Alright, well are there any other aspects of this that we should cover before we wind down and do the picks?
JAMES: I think drawing things is cool.
CHUCK: Yeah, and I really do like the idea of just sitting down and sketching it out on a napkin if that’s what you’ve got. I’m going to have to keep that one in mind. Alright, well let’s do the picks then. David, what are your picks?
DAVID: Okay, well I’ve got three today. The first one came up in the pre-call. InternetTrafficReport.com. If you’ve ever wondered why the internet is slow today, you can go to this website and it shows all of the major backbone routers for the internet all around the world and whether they’re up or down and what they’re current health level is and ping time and that sort of thing. And when there’s a big state-sponsored DDOS on a country, you can actually see it in the traffic report, which is fun and interesting. And then for my other two picks, I debated really hard. I’ve got one that’s really classy and one that’s classic Dave Brady, I guess. The classic David Brady one, I’m going to link to a YouTube video. This is work-safe if your work is okay talking about poop. And it’s the funniest commercial I’ve ever seen. But consider the source and remember my sense of humor.
DAVID: And that’s all I’m going to say about it to keep the show classy. But it is quite funny. It’s for a real product that you can use in the bathroom. That’s all I will say about that. And my last pick which is very classy and absolutely adorable, on Netflix there is a new series called Too Cute. And they come in and they give you a warning at the beginning. Warning, the material in this program is too cute. Viewer discretion is advised. And then it’s 45 minutes of puppies and kittens and ducklings and you get to watch them from birth until they open their eyes until adoption date when they’re six weeks old or eight weeks old. And it’s cute overload, the movie basically. Or the TV series, basically. And it’s adorable. It’s a good warm, fuzzy, fun thing to watch. And them’s my picks.
CHUCK: Very nice. James, what are your picks?
JAMES: So I wanted to use my pick time to talk a little bit about this problem that I hope we [don’t] know the tech community has on gender diversity and how that sometimes leads to bad things like sexual assaults and stuff like that. I’ve been reading about this topic for a long, long time. I maybe have 40 hours of just studying nothing but this topic and I still feel like a total idiot, not confident giving a lot of advice on it. But I will say a couple of things. One is that this problem occurs pretty naturally because of our diversity problem. So it helps to understand how it comes about based on the lack of diversity in the tech community. And so I’m going to put a link to the show notes to this article. And the real purpose of this article is something unrelated and I think I would advise you just to ignore that. But the article does have some really good visualizations. We’re talking about data visualizations. And this particular one shows that how you have a diversity imbalance, it creates problems like this and why. And it was super, super eye-opening for me and very powerful. So I recommend you go look at those. And then another thing I think that we should start realizing is how this is all of our issues that we share, especially us men who don’t get hit with this problem as much, and how it affects us. And there’s a really good TED talk on that about how it’s our issue and not just a specific group’s issue. So I would recommend you watch this TED talk. And these are very, I think, introductory pieces on this topic. But it’s really important and it’s something that bothers our tech community and something we need to start becoming aware of. So you’ve got to find your way into understanding on this stuff and I think these two links are a great way to start.
CHUCK: Awesome. Alright, Josh what are your picks?
JOSH: I’m going to pass this week.
CHUCK: Alright. I’ve got a couple of picks. The first one is I’ve been playing this game on my iPhone. It’s called DragonVale. And basically you get to raise and breed dragons. It’s fun. And I’ve also been playing another one called Dragon City which is similar, but not the same. And you actually get to battle with your dragons on that one. So those are fun. And the other one that I’m going to pick, I’ve been playing with this new system called Office Autopilot. It’s not cheap, but it appears to be able to do all of the automation stuff that I want to be able to do in my business. So I’m really excited to give it a try. I actually spent about a half hour talking to their guys last night. And I’m going to be getting some more introductory support tomorrow and then again on Monday. They’re actually going to help me set up the whole thing so that it manages things like my marketing funnel and things like that. And so I’m really excited to see where this will all go. And you can go check it out at OfficeAutopilot.com. I also have an affiliate link that I’ll put in the show notes. So if you go signup, I would appreciate the kickback but ultimately, if you go check it out on my recommendation and you like it, then that’s good enough. So anyway, those are my picks. And Aja, what are your picks?
AJA: So I’ve got one nerdy pick. And that’s the book the Realm of Racket. It’s from No Starch Press. And it’s an introduction to the Racket programming language, which is a dialect of Scheme, which is a dialect of Lisp. But it teaches Racket through games. And we’ve been working through it in a study group that’s based out of Seattle.rb and it’s a lot of fun. We’re about two-thirds of the way through and I’ve made a game that involved killing various kinds of slime monsters. And I’m currently working on this crazy dice game that’s kind of like Risk and kind of not. And there’s a lot of really good learning involved. And I’ve worked through Structure and Interpretation of Computer Programming about 18 months ago and I’m still learning cool things from Realm of Racket. So it’s not like, it’s really basic Scheme. And it’s very approachable, which is the other thing I really like about it. It starts out pretty easy and [inaudible]. And then I have two picks that are completely unrelated to nerding stuff. I’ve been doing a lot of conference speaking this year and I’ve really enjoyed it. And I have two things that have helped me get better at that. One is a Ze Frank video called How to Public Speaking. And I’ll put up a link to that. And I like it because Ze Frank is hilarious and I think it covers the really basics of how to give a good speech without being overly long and overly dogmatic about it. It just covers the simple basics of how to feel confident and also how to give a good speech at the same time. And the last one is Toastmasters. I did Toastmasters for about 18 months at a previous job. And there are a lot of people who seem to have a kneejerk no reaction to Toastmasters but the group I was with was fun. Nobody was particularly good. And we learned a lot from each other. And we had a lot of folks that were even relatively inexperienced English speakers. So we all helped each other out. And I felt it made me a better speaker and made me feel a lot more confident when I was up in front of a group. So if people want to get into public speaking, I highly recommend Toastmasters. Those are my picks.
DAVID: Thought it was cool that you picked Realm of Racket and you were describing slime monsters and a dice game and I’m like, “That sounds like Land of Lisp.”
DAVID: And then I just looked it up online and Conrad Barski’s one of the authors, so that’s cool.
AJA: Yeah, it’s very similar to Land of Lisp. I have not worked through Land of Lisp but I’m enjoying the heck out of Realm of Racket.
CHUCK: Awesome. Alright. Well, before we wrap up, I do want to mention our silver sponsor and that is Elixir Sips. You can go find them at ElixirSips.com or you can go to the website and click on the banner. It’s a Ruby Tapas type thing for Elixir. So you get a couple of videos a week. Anyway, it’s pretty awesome so go check it out. And our next book club book is
JAMES: Functional Programming for the Object-Oriented Programmer.
CHUCK: There we go. So go check it out. There should be a, do we have a discount code for that?
JAMES: We do have a discount code and I always forget what it is, so maybe we’ll put it in the show notes.
CHUCK: Yup, we’ll get it in the show notes then. But thanks for coming, Aja. It’s really interesting to talk about this stuff, especially since a lot of it really doesn’t boil down to how you code it. It boils down to how you think about it.
AJA: Yeah, this was a lot of fun. Thanks, you guys, for having me.
DAVID: Yeah, thanks for coming.
CHUCK: Alright. And with that, we’ll catch you all next week.