The Ruby Rogues

The Ruby Rogues podcast is a panel discussion about topics relating to programming, careers, community, and Ruby. We release a conversation with notable programmers and Rubyists each week to help programmers advance in their careers and skills.

Subscribe

Get episodes automatically

239

239 RR Swiss Army Rubyknife with Peter Cooper at Ruby Remote Conf 2015


This episode is from Peter Cooper’s talk at Ruby Remote Conf 2015. You can watch the full, unedited presentation, Swiss Army Rubyknife, on YouTube at your convenience.

  • Check out All Remote Confs for next year’s remote conference lineup!
  • Ruby Remote Conf will run from March 23rd-25th 2016. Buy a ticket or submit a CFP!
  • JS Remote Conf is running from January 14th-16th 2016. Check out the speaker lineup!
  • Freelance Remote Conf will run from February 24th-26th. We’ve already got a great list of confirmed speakers. Stay tuned for more details!

We, the Ruby Rogues panelists (And, Mandy!), wish you a very happy holiday season.

This episode is sponsored by

comments powered by Disqus

TRANSCRIPT

[This episode is sponsored by Hired.com. Every week on Hired, they run an auction where over a thousand tech companies in San Francisco, New York, and L.A. bid on Ruby developers, providing them with salary and equity upfront. The average Ruby developer gets an average of 5 to 15 introductory offers and an average salary offer of $130,000 a year. Users can either accept an offer and go right into interviewing with the company or deny them without any continuing obligations. It’s totally free for users. And when you’re hired, they give you a $2,000 signing bonus as a thank you for using them. But if you use the Ruby Rogues link, you’ll get a $4,000 bonus instead. Finally, if you’re not looking for a job but know someone who is, you can refer them to Hired and get a $1,337 bonus if they accept the job. Go sign up at Hired.com/RubyRogues.]

[Snap is a hosted CI and continuous delivery that is simple and intuitive. Snap’s deployment pipelines deliver fast feedback and can push healthy builds to multiple environments automatically or on demand. Snap integrates deeply with GitHub and has great support for different languages, data stores, and testing frameworks. Snap deploys your application to cloud services like Heroku, DigitalOcean, AWS, and many more. Try Snap for free. Sign up at SnapCI.com/RubyRogues.]

[This episode is sponsored by DigitalOcean. DigitalOcean is the provider I use to host all of my creations. All the shows are hosted there along with any other projects I come up with. Their user interface is simple and easy to use. Their support is excellent and their VPS’s are backed on Solid State Drives and are fast and responsive. Check them out at DigitalOcean.com. If you use the code Ruby Rogues, you’ll get a $10 credit.]

[This episode is brought to you by Braintree. If you’re a developer or a manager of a mobile app and searching for the right payments API, check out Braintree. Braintree’s new v.zero SDK makes it easy to support multiple payment types with one simple integration. To learn more and to try out their sandbox, go to BraintreePayments.com/RubyRogues.]

CHUCK:  Hey Ruby Rogues listeners. This week, due to some unforeseen circumstances we had some issues getting everybody together to do the episode. So, I’m actually going to pull a talk that was given at Ruby Remote Conf this year in June by Peter Cooper. He called it the Swiss Ruby Army Knife or something like that. Anyway, go check it out. Let me know if you like it. This is going to kind of be my go-to for when we don’t record. That, or I’ll just get on and maybe monologue about something. But in the meantime, I really hope that this helps out. If you like this format, we’re doing Ruby Remote Conf again in March. You can go to RubyRemoteConf.com and check it out. We also have a Call for Proposals open if you’re interested in that. So, feel free to jump in and check it out.

Finally, I’m also doing JS Remote Conf and Freelance Remote Conf in January and February. So, you can find those two at JSRemoteConf.com and FreelanceRemoteConf.com. If you want to find out more about any of the other conferences I’m putting on this year, you can go to AllRemoteConfs.com and check those out as well. And here’s Peter Cooper.

Peter is the author of ‘Beginning Ruby’, editor of Ruby Weekly, and chairs O’Reilly’s Fluent Conference. He publishes programming oriented email newsletters, podcasts, and screencasts full-time. You might know him from Ruby Weekly or JavaScript Weekly among others. So yeah, and I’ve known Peter for a long time and he’s super friendly and very, very knowledgeable. So, we’ll turn it over to Peter for his ‘Swiss Army Rubyknife’ talk.

PETER:  Thank you very much, Chuck. Don’t believe all the rumors about being knowledgeable. I just put together what I find online and manage to get by. [Laughs] So, I guess if I have any skill, it’s taking what I see out, moving it into a form that other people can enjoy and learn from.

So, one thing that I mentioned to Chuck actually, and I have on my other screen, I have the webinar stuff. I can see you typing and so on. One thing I mentioned to Chuck is that my upload bandwidth here is really, really bad. So, when I actually get to going through my slides and doing some of the live coding and stuff I’m going to show you, I’m actually going to turn my webcam off.

So yeah, I’m Peter Cooper. I just want to quickly cover a couple of the things I do just in case you don’t know about them. They’re kind of relevant to what I’m going to talk about. So, RubyFlow is basically a community website where you can put up news and links and stuff like that of your own. And people seem to do that every day. And I kind of pick up tons and tons of awesome stuff for Ruby Weekly, which Chuck mentioned, which is… kind of began the whole line of things to where I am now. I have about [inaudible] different newsletters going out on a weekly basis. And it’s pretty much my entire business.

And it’s funny actually, because Chuck asked me to talk. We’ve known each other for a  long time in various ways. And I used to be briefly on the Ruby Rogues podcast that he does. So, definitely subscribe to that. But I don’t really [inaudible] a lot anymore. So, that was the first thing that came to mind. He’s like, “Do you want to talk?” and I’m like, “Well, I don’t really program.” But, I have a client from many, many years ago, an e-commerce client that I still maintain their site. But that’s pretty much it. I’m not one of these $200 an hour kind of consultant type people. So I thought, “Do I really program? Do I have anything to share?” And then I realized I actually program a ton. I basically now use programming as a tool to do other things that I need to do while I’m working in my day-to-day business.

So, I’m just a publisher, basically. And I manage to use code in that all the time. But they’re not just code. Things in my personal life as well. And I’m going to cover a couple of those as we go along. So, what I want to go over in this talk is that even if you aren’t just doing Ruby all the time you can use Ruby as this interesting side tool to get other things done. And I wanted to show off some interesting examples of that. There’s a few humorous ones as well and also other things that other people are using before, on the side, to get things done rather than their big enterprise-y or big Rails apps. What else can you get done with it?

And what I’m going to do is I’m going to turn my video off now. You can see how lovely and clean my office is. But I’m just going to turn the video off. So yeah, this is pretty much me on a day-to-day basis.

As you can tell I have a subscription to Shutterstock because we use a lot images here. So I though I better make full use of those images.

But does anyone remember ‘delicious dot com’ or all ‘del dot iciou dot us’ as it was originally called? It was basically the first major tag bookmarking service. So, you could kind of push a bookmark in your browser. It would save a link into your delicious account and you could tag it with things. So, I might tag say the Ruby Rogues podcast for example. I might tag it with Ruby and podcast and Chuck or whatever, whatever tags come into my mind. And actually, the fourth most popular page on delicious is almost like Hacker News or reddit before they even existed. It was like the place where a lot of people in the know when to see what was the coolest stuff at the time. And this is like 2005, that kind of area. But eventually, the guy behind it, Joshua Schachter, I believe his full name is, he sold it to Yahoo and yahoo kind of messed it up like they do with many things. It got slow, horrible, eventually kind of half seemed to shut down.

People fled to things like Pinboard, which is in the screenshot here, which actually kind of went back to the more bare-bones just share your links kind of approach. And I really… I had tons of stuff on delicious. I had probably like three of four thousand bookmarks at the time. Used it all the time, really enjoyed it. But I realized I wanted my own thing. I didn’t necessarily want to jump to another service that might let me down. So, I thought, “Well, hang on. How can I create this? I could build a cool Rails app and just replicate what Pinboard are doing.” So, I thought, “Well what are involved? What are the models and the ideas that are involved in an app like this?”

So, I realized it’s basically four different concepts I had. There’s the whole bookmark thing. There’s some tags that it can be ordered by. I wanted it to be searchable. So, if I think, “Well, I want to find a Ruby video,” in the future I can search for Ruby and video and anything tagged with both will come up. And last but not least, it’s synced in some way. So, it’s either hosted centrally or some way I can get it between all my different machines, because I don’t want to have every machine have its own list of bookmarks.

Now, web browsers like Chrome, they do syncing of bookmarks and stuff now. But at the time, they didn’t. So, I had a brain wave. I thought, “Hang on. I can build this big Rails app and everyone will want to use it.” Or they won’t and it would just be me and it will break and I’m going to deploy it. But I thought, what are bookmarks? Bookmarks are essentially URLs. But what are URLs? They’re text. So, I got to thinking, well tags is also text. So, we just got a ton of text essentially. Searching, you can kind of do that with grep at the command line, fair enough. And syncing. Well actually, at the time, I’d just install something for syncing data between all my machines. It was called Dropbox. So, I thought hang on. Text files, data files, searching, Dropbox, you know, there’s an interesting thing to try here.

So, I’ll give you a very quick demo of what I came up with. And several years later I am still using it. So, I’ll bring my prompt into play here. Now, this is really, really basic stuff. But if I show you in my links folder here, I have two things. I have something called L. And I have something called ‘L dot T X T’. And they’re both really very simple. So, if I do a tail of the ‘L dot T X T’, it looks like [inaudible]. If I stretch my [inaudible] so that we get things onto individual lines a bit more, you can basically see all I have in this text file are URLs followed by a title. And then I have a date following that. And then at the end, I move it across, you can see I have various tags. It’s literally just a bunch of text, as I was describing.

So, what is in L? Now, I’m going to show how extremely cool I am and load this in nano because I haven’t got it loaded in Sublime Text. And I’m too scared to use vim that I will show myself up and keep hitting escape over and over until I quit. As I said, not doing this all the time. So, this is very, very simple. It just brings in two different libraries. One to handle read line and [inaudible]. So, you can use arrow keys and kind of up and down the history and stuff at the prompts within this little program. And I’ll show you how it works in a minute.

But pismo is a library I wrote that can go and fetch a web page and grab the title from it, grab an excerpt from it, try and tag it, all that sort of stuff. And then within that, we get access to the ‘L dot T X T’ file that we just saw. And I use Ruby 2’s feature now. I kind of tweaked this from the old way to the modern way in Ruby 2. Get a hold of the current directory and then load the ‘dot T X T’ from there. And then there’s this very simple class called Link which basically is just an internal representation, a pure Ruby representation of the links within that file. Very simple, parsed with a regex. And it’s just really, really simple, bare-bones stuff.

There’s a little mechanism for an interface to add links into the file, to list all the links in the file, to find things in the file. And that is it. Then we have this bit at the bottom here which is literally the entirety of the actual activity of the script, which is just reading what comes in on the command line. I don’t use any kind of framework or system for this. It’s just all done completely in the raw. And if anyone wants to code for this later on, just let me know and I’ll share it. It’s so simple though that… one of the goals of this is to encourage you to do things like this for yourself. Just do things in a scrappy way and it just works for years and years.

So, just to show you how it works, if I type L on its own and I have this symlink to use a local bin so it would work from anywhere, type L and run it, it shows my last ten links, just with the day, URL, so on, and so forth. If let’s say I want to add a link to the system, if I bring across a browser tab that I’ve got, so I’ve got a browser tab here of a Wikipedia article, this is how I use it. I literally type L at the command prompt, and I always have a command prompt open so this is like second nature to me, just drag it in. And ta-da, we have the URL appears, hit enter, pismo goes and fetches it so I can push up. And it will suggest a title for me. It will then also try and suggest some tags. Some of them are okay. I’d probably do world. I might do Wikipedia, a few things like that. Hit enter, and it’s done. And I can now see it’s in my system. And if I do say like this, say L Wikipedia counties, like I remember this in a few weeks, like I remember this in a few weeks [inaudible], oh those counties from Wikipedia, it will [inaudible] that.

And actually [inaudible] I’ve kind of messed it up because I’m refactoring the code today. So, there’s my embarrassing moment. But what it would generally do is it would go and find the items with those tags and then present them in a list just like these ones here. And then if it was a single, basically if there was a single entry came back, it would then automatically load that in the browser. And just to show you how it would do that, if it actually worked… let’s have a look. Where is it? So, it’s this line here. On [OS] I’m literally using open and then a URL will open, a URL within the web browser, your default web browser. So, very simple, very cool, very easy.

And I’ve used this system as I said, for years and years. I haven’t felt any need to go to Pinboard. And yeah. As I also said, I’ve refactored it today to make the code look a bit… and I’ve broken it. So, moving these things out of the way. So, that’s one of the first things that I’ve got.

And you might have noticed, I wasn’t even using JSON or YAML or anything there. I was just using plain text. I could improve in different ways. I could even use SQLite or something like that. But I realized that you can actually do many different apps in this way. So, if you just have something that ls literally data that can have some scripting working on it, and then a syncing element like Dropbox, that’s a ton of apps I can have across all my different machines. No one else can use them, of course. They’re just for me. But if I wanted to keep a diary, I’d probably use this approach. Just anything where I want to store and collect and sort through data, this whole Ruby plus text plus Dropbox, I’ve just seen so much promise in it and I use it for a variety of different things. So, definitely keep that in mind. I works so well.

One thing that I used to use this for and I no longer do because I have lots of different editors, was for the link collection system for my newsletters. So, Ruby Weekly, JavaScript Weekly, and so on. And all that was, was a Rails app, so a completely fully-blown Rails app. But it was hosted within a Dropbox folder and used SQLite 3 as the database. Now, I had a big crush on SQLite so I’ve used it in a ton of places over the years. But I thought this was a really funny use for it because it kept the database synced up between all my different machines. So, I could easily work on my links from my laptop, desktop, it all worked fine.

Now, a lot of people said to me, like this is me basically. It’s again my Shutterstock subscription coming in handy, you’re kind of playing with fire here. You end up with the database getting not synced between things properly and it’s just going to be a giant mess. I seem to get lucky. Other than at the office here, [inaudible] connection in most places I go. And I always make sure my Dropbox syncs before I do anything else. So, I seem to get lucky and it all just worked. If it doesn’t [inaudible] you just get a copy of the database and you can fix and patch it up. But I seem to do alright.

Now, one other little thing that I have done this scrappy approach with was my newsletters when I launched them. So, this was just under five years ago. I started my first one, which was Ruby Weekly. And I wanted a mechanized way of doing it. I didn’t want to do what most people do right now, which is they’re going to MailChimp or a system like MailChimp and they sit there in the interface and they put the newsletter together. It takes forever. This WYSIWYG things are so slow. I’m a developer. I’m [inaudible] easy and all that type of thing. I want an easy way of doing it. So, I just want to put my links in some sort of XML or YAML or JSON or whatever format that [inaudible] out. But then have that convert that data into these other things I need. And what I need basically is an email producer. And what I needed to start with was some HTML to come out and some text to come out. And then I could literally copy and paste these into MailChimp and it would just work I wouldn’t have to spend more than five minutes in MailChimp to send an issue.

So, I started to have a think about… I had my items. I wanted to push them through some templates and then get them out. And then I thought, well hang on. Another step to this. Instead of just getting them out and copy and pasting it to MailChimp, I can use MailChimp’s API. And very quickly, this is what I did. Now unfortunately, I’ve lost all the code from that era. But I produced a YouTube video about it for some people that were interested. And I’ve done some screen grabs of it, just to give you a very quick look of how it worked. So, it no longer works this way. And I’m sorry if this is a bit blurry. It no longer works this way, but the first newsletters were YAML files.

So, basically, this is like my introductory paragraph. And then you can see all these things going down the file were the items. So, Rails 3 Release Candidate came out. RVM 1 came out, back in 2010. Literally, it had all these items. And then I had basically an HTML ERB template and all it did was picked through the data within those YAML files and rendered them out into the HTML I wanted. Very, very scrappy. I had a similar one for doing a plain text email, worked in a very similar way, but obviously without the HTML tags.

This was the entire thing, pretty much, that went through text and HTML templates, opened up the YAML file, opened up the ERB template, and then ran the ERB template against the content from that YAML file. And that produced an issue that looked like that. Super, super simple. And that’s all it was to start with. And even [inaudible] all these people come to me and they say, “I want to launch my own newsletter. I don’t want to waste time on it, like too much time putting it all together.” It’s like, you’re a programmer. This took me like an hour at most to get all this working. And then all you have to do is hook up an API to whatever email system you want to use, whoever it is.

So, MailChimp has a great API. Just hook it all up. Run the code and have it push it into a new campaign. And you can always even have it send it from one script. You can be up and running so quickly. So, people ask me, “What’s the special sauce to your business?” And initially, it was this type of thing. It’s a lot more elaborate now. But I just began with this and just started building it up step by step. I didn’t build this massive application that was awesome from the get go. I guess this is a theme with me. I just build something that’s a pile of trash basically, make it run, get the result I want, and then improve it over time. Definitely not one of these perfectionist programmers.

So, another thing that I can credit Ruby for is coming up with my youngest daughter’s name. Here is my daughter yesterday, having fun. Or perhaps a nicer picture, on the beach. She’s called Imogen Grace Cooper and she is two. And we couldn’t come up with a name when we were at the hospital. So yeah, Thomas in the channel said, “It’s simply great not being a perfectionist programmer.” I totally agree. So yeah, we were sat in the hospital and had time to kill. I had my laptop of course because sending newsletters all the time. And I thought, well hang on. We can’t come up with a name. Maybe there’s an approach for this. So, I will give you a very quick live demo as to what we did.

Now, I’m going to bring this in. I have this file here called ‘names dot R B’. Perhaps I’ll just give you a nicer view of it here. So, we had some names that we liked the idea of. Obviously the last name we couldn’t change. We were going to be Cooper and stick with that. But then we had all these first names. And being English and not quite middle class, obviously we aimed for very [inaudible] names that make it sound better than we are. So, we came up with these things like Millicent and Eloise and so on and so forth. Some of these are very English. And then some middle names that we liked a lot, the sound of. Holly and Lily and things like that.

All I literally made was this script that looped around three times and used the sample method of Array, pick out something from each list, and then it would use the say command on OS 10 to actually read them out. Because some names you might thing, “Oh, I like that first name. I like that middle name,” but they just, it just sucked when you say the name. And now, I don’t think that this system here shares sound. So, I don’t think you’re going to be able to hear this. But I’ll run it anyway just to see if it does. If not, I’ll pretend to do it.

So, I don’t know if you can actually hear it say that. But it just reads them out using the robotic voice that you get on OS 10. And yeah, so basically we repeated this process over and over saying them ourselves and [inaudible] them backwards and forwards. This is how we came up with the name. It was as simple as that. Very, very simple stuff. And it worked. So, now we have Imogen Grace Cooper and it all went very well.

Another thing that I kind of did, this is more recent. But I was taking my blood pressure for various reasons and storing it in Evernote. Obviously, these are pretty not very healthy blood pressures. If you’re seeing these you’ll probably think, “Oh god, why haven’t I died of a heart attack?” You know, I was just checking [inaudible] about a year ago or so. But I thought, well hang on. I want to get averages  of this. And I don’t want to sit there with a calculator. I thought I’d just use Ruby for this. So, just run up some simple code, pasted it in. And this is a cool little trick, if you don’t ever use this. Just use percent and then curly [braces]. It just allows you to define a string with almost anything in it essentially other than other curly braces, copy it in, get all of this information into my variable, A, rather creatively. I split the stuff up around the forward slashes, convert everything into numbers, and then use the inject to get averages. So, I could see what the [inaudible] diastolic and systolic pressures were, and just check out what was going on with the old blood pressure.

Another little thing that I needed to do. So, let’s just cherry pick a few examples here from my scrapbook kind of folder. One thing that I did was I got hold of a leaked song. Now, I can’t remember what it is, but obviously very naughty and probably illegal for me to get that. But I wanted to share it with someone else that I knew. And because I am extremely paranoid, I wanted to make sure that there was nothing in the ID3 tag. I think I might have even bought the song or something. It was something I wanted to share anyway, and I shouldn’t have been doing it. And I wanted to see if there was anything in the ID3 tags that was shady or identifier or anything like that. So, I couldn’t, there were these apps on Windows on reading ID3tags. And I didn’t trust iTunes and stuff. So, I just thought, is there a Ruby thing that can read out the data from within the mp3 file? And there was. So, I literally just put it into here, passed in my mp3 file, and looked at all of the raw data in there. And I think there was actually some kind of identifier in there somewhere. So, I managed to find some site that scraped the ID3 tags. So, very naughty but it did the job.

But one that I had literally last week, was I had a customer who had those domains registered for them. And they [inaudible], what were all my domains? And there’s like, tons of them. So, I went to this site where, that manages the domains and it gives you a drop down box of all the domains that you have there. But I just couldn’t be bothered working [inaudible] or get some out put from this that made sense. I literally went view source, I copied all of their option tags, and these are the different URLs here, the domains that they’ve got. I copied and pasted in the HTML and then literally over that variable scan with a regex, joined it all back together. And I had a nice list that I could then copy and paste into an email. Or, if I so wished, we have the data. I could put commas in and export it into a CSV or something like that. But again, it just took a job that could have… and someone else who wasn’t a programmer could have sat there and typed it all out for 20 minutes. It literally just [took] a minute using a bit of Ruby.

And so, this brings me to, actually [I’m going to move everything around on] my desktop here because I kind of messed it up. This brings me to scraping. So, scraping is a very, very common thing for people to do on the side in this way. And there’s tons of ways to do it. I usually do it in the messiest way possible. You’ve probably gotten that theme by now from me.

So, recently we had some packages delivered to our office. And in the UK Amazon uses a company called DPD to do a lot of these delivery to Prime customers. DPD has a site which sadly I don’t have a screenshot of, but where you can actually see where your driver is on a map. So, if I order something from Amazon literally right this minute, tomorrow morning it will say, “You’re going to get something from some delivery driver. Steve is going to deliver something. Click here and you can see a map of where he is and what delivery number you are and all that kind of stuff. So, I thought, well, hang on. It’d be really to cool when he’s nearby, not just when he gets here, because you get a text message when your Amazon delivery arrives. But I want to know when he’s nearby. So, I can look at the window and I can see, [inaudible].

So, I went onto their site and I noticed it wasn’t loading immediately. It loaded this web page and the map. But then it would load where he was. It would take 10 seconds. So, I used the Chrome Dev Tools to monitor what was coming over the wire. And I noticed they were loading this JSON thing over Ajax that had the location info in it. It was a JSON payload. Now, if I copied and pasted that URL it wouldn’t have worked, because it was all based on the session and everything. But if you right click on something in the Network tab of Dev Tools, you can get a curl command. There’s something like copy curl command or something like that. And it will give you a curl command that exactly replicates the request. So, I did that. I used the back ticks thing in Ruby to get that JSON payload into Ruby on an automated basis. Made it access every 30 seconds or so. And then I made it connect to a Slack chat room that all of your employees are on. And then I had it monitor the progress of the driver. So, he was actually called Steve.

And what I had it do was just monitor every now and then what was going on. Now initially, it wasn’t very well written. It gave distance. It was a bit too precise. I found this piece of Ruby code that could take latitudes and longitudes and work out the distances between them. But eventually I set rounding up and the various other features and it wouldn’t keep hassling us if he didn’t move and stuff like that. But [inaudible] we got a good system. And yeah, sorry for the swear words here. But it got a bit excited when Steve was just around the corner. Is the screen frozen? I’m just noticing on here. Can you see a lovely picture of a delivery driver at the moment? If not, I can try and restart the sharing. Yay, good. So yeah, there’s this lovely picture. That’s not Steve, of course. This is some guy from some American TV show that I don’t know. One of my employees recognized him anyway. But if you did want to see Steve, there is actually the real Steve delivering an actual package for our actual office at that time. So, it all worked very, very well. And next time that DPD’s going to deliver, we’ll get that bot up and running once again.

So yeah, one thing that we mentioned in the channel is mechanize. Now, I was going to mention mechanize because if you’re going to do things properly, that’s what you do. Because mechanize can, you can script it to go to a web page, fill in forms, click buttons, and then read responses and stuff. So, that may have been another way of getting that JSON payload without using curl in a messy way. But of course, I’m not here to give you the proper way of doing anything.

So, I actually want to show you another interesting technique that I’ve never seen anyone else use but I think is really, really handy. It again involves scraping but in a different approach to the mechanized style approach. So, I’m going to do live coding for this. So, let’s say, let [inaudible] that often comes up for me. Like for some reason, I’ll need to get access to some data from somewhere. I just picked this as a raw example today for the benefit of this. I want to get a list of all the counties in the UK. Now unfortunately, there isn’t… even if there’s lots of CSV and text data sources out there for things, I don’t ever seem to get a hold of them. And I find it quite difficult to find them. I usually end up on a page like this.

So, this is a list of the counties of the UK, which is vaguely equivalent to US states, at least in number. There’s about a hundred and something. I live in [inaudible]. The problem is, if I come through here and I start copy and pasting, I’m going to end up a real mess. Hashing all this out is going to be really difficult. I have a slightly different approach. There’s a very, very cool bookmark link which I’m hovering over here called SelectorGadget. If you google for SelectorGadget you’ll find it easy to install. There’s even a Chrome Extension to do it now. What SelectorGadget will do is it will let you pick content off a web page and then tell you what CSS rules you need to use to get access to that item.

So, let me show you how it works. I click SelectorGadget. It loads it up. And I’ll make the window a bit smaller here. So, it now shows me when I hover over an element, it shows me that element. So, I want to get access to the counties that are listed down here. So, I’m going to click one of these. Now, it marks in yellow what it’s got access to. So, TD is like literally every single table cell. Well, I don’t want these ones over here. So, now what you do it you can click again to mark which things you do not want to match. So, I don’t want it to match this one. So now, it’s rather cleverly worked out. But I probably just want the first color, and given me a rule to access just that first column. And your problem is that I don’t really care about Northern Ireland. And I don’t mean that in a racist way. I don’t really care. I don’t want to get access to those counties there. Nor do I want to get the counties of my northern compatriot. So, what I’m going to do is I’m actually going to try and negate those as well. So, I click there. It gets rid of that.

And actually, this is behaving differently to the Chrome Extension which got this right first time. What I’m going to need to do is I’m going to need to start adding on some of these. So, if I click here… and actually, this has done a very poor job of this. When I used the extension, it gave me a rule. And I think it’s because it’s a newer version. So, let’s me see if I’ve got a newer version of SelectorGadget. This is a complicated rule, I will grant it. Let’s try this one. Right there. Not that one. And then get rid of Northern Island. There we go. Yeah, old one and the beta one. So, I guess the beta one is better. So, to get access to these, here’s the rule, very, very simple. But the problem is, well how do I get those out now? Well, I could bring up, actually it’s picked that one up as well. There we go. That’s the rule. I recognize that now. It’s ‘dot wiki table’.

I could open up a JavaScript console and then actually use jQuery or just any old way to get access to these things. But I know Ruby and so do you. So, I want to use that. So, what I’m going to do is I’m going to create [inaudible] a new file here called counties.rb. And I’m going to literally step through, step by step now. Let’s do what I would do to implement this. So, the first thing I would want is I would want Nokogiri to parse the HTML thing. I know there’s a few different options now. But I’m going to stick with that. I want open URI because I want to access the stuff that’s going on. So, what I’m going to do is copy and paste this into here. Lovely. Then I’m going to access the data. I’m going to load up a, get a Nokogiri representation of the data. And then I’m going to use the CSS method to get access to the CSS. Oh sorry, to the yeah, CSS that gets the data.

So, I’m going to copy and paste this out. Put that in. Now, I’m going to move this over so you can see everything I’m doing. I’m going to say each, so each, E L for element, let’s say. And then what I’m going to do is I’m going to put ‘E L dot text’. So, that would just print out the pure text representation. Now, if I’ve got this correct, let’s see. I don’t normally run stuff within Sublime Text. But I’ll give it a go here, because it will look better with that down there. So, command and B. Of course, it’s moved it all the way over to the side. Right. So, as you can see now, it has found all of the county names and I can now work with them, play with them however I wish. But there might be other things that I want to do. So, I’ve gotten hold of my counties and you can scrape so many web pages using this type of technique.

But let’s say I want to get counties that existed before 1889 for example. So, I want to find ones that have got ticks here as well. Well, this becomes a little bit more complicated. But one thing that I could do is I could say, before 1889 is true. But what I’m going to do is I’m going to write another piece of code here to access the next element along. And I’ve got TD nth child 2. Well, the next element along is actually nth child 2. But what I need to access is on the parent. So yeah, I do that. Right. So, that will get access to that. And then I want the inner HTML because I want to see whether it’s got that tick in it or not. And I know, I’m not going to demo this but I just know that it contains the word yes with a capital Y. Because doing this will return the number and I want a true or false, I’m going to do this in a really scrappy way and double bang it so that it can basically give me a true or false. Kind of a little bit [pedantic], things like that. I want a true or false rather than a 10 or a nil. Like so.

So, I can say, print. Didn’t exist before 1889. Just like that. This is just totally [inaudible]. Yeah. I don’t know why I do it. So, I’m doing it in a really messy way, but I’m really just trying to show off the point. Because normally you’d export this to something else. But if now run that code it’d going to break and not work because I’ve probably screwed something up. Item doesn’t exist. Oh, it’s not item. It’s EL. Run it again. There we go. So, now it shows me the counties that I’ve got, but it also tells me the ones that didn’t exist before 1889.

And I do stuff like this all the time. So, I’m always messing around, scraping around just for, you know, tons of different things. I have to scrape data from all over the place. And especially for sites that just do not have machine-readable data. These types of approaches can help. And the good thing is that obviously you can now tack on perhaps JSON, turn it into some kind of output like data structure, do output ‘E L dot text equals whatever it is’, create these output structures and then do things like output to JSON, just do stuff like that. I’m not going to show you any demos but you can use your imagination, just tons of scrappy stuff, tons of fun.

So yeah, moving on back to here. So, one of the things that I do best as a publisher is I [inaudible] other people to do my work for me. [Chuckles] So, in that vein I asked the world what they were doing and some of the scrappy things that they’d used Ruby for over the years. And I had probably about 20 responses. [Inaudible] a few of them as I come to a conclusion here. So, sorry if these are a little bit hard to read. I’m going to read them out. So, someone responded, a guy called ‘Y 8’ and I couldn’t find out what his real name was. He said, “I’m scraping data from an ultrasonic rain sensor to get water level for a well” that he has control over. I’ve just got the code from it here. He linked up the code. It’s very basic stuff. So, I’ve increased the font size a little bit so you can see it a bit better. It’s coming off a Raspberry Pi. It’s getting the data over serial cable. He reads that in and just does a variety of stuff with it. I just thought that was kind of cool.

Actually, I’ll get rid of this one. Chris Mar, he said that Rake’s great for moving around lots of files like sorting photos and stuff like that. Again, he gave me an example of this in action. So, this is a Rake file. I’ve never used Rake for this before, because I find Rake a little bit hard to understand [inaudible] times. But he has these files with timestamps in the name. And he has a task here that will go through, pick out the different parts, and then create directories based on the filenames and then move the files into the relevant folders. So, I thought that was kind of cool to see as well.

This person called Intern Hack said renaming scanned PDF invoices by matching content with a list of supplier names. Now, as soon as I saw this it was like, I do this as well. This is actually how I manage some of the expenses in my business. I save PDFs into a folder and then I use one of the many PDF reading libraries that exist in Ruby 2, read out the total, the VAT, so that’s like the tax here, the name of the vendor, all that sort of things. Then actually turns it to CSV that I can import or give to the bookkeeper, that type of thing. You know, so again that sort of thing is easily possible. There are so many Ruby libraries out there that can read different types of files that you might not expect. And so many different things that can output things. So, almost any kind of input/output task that you may have to do manually, you can always get Ruby to do so much of it. Even OCR stuff, it’s just ridiculous what you can do.

Joel Hooks said, “I recently used it to cobble together a CSV mass DMCA notices to YouTube.” Now, you might think, “Oh, that’s a bit evil.” But he is involved with a site called Egghead.io which does really cool video tutorials to JavaScript and stuff. So, I guess they were… I’m just guessing here, that he doesn’t work for Universal or anything. He’s actually protecting his own videos, which is fair enough.

And then Paul Campbell, again this is something else I need to do, he said, before I got auto bank feeds into Xero, so Xero is quite a popular accounting, online accounting system that people use, at least here in the UK and I think Australia and New Zealand. I don’t know if it’s big in the US or not. But he wrote a tool to take CSV that a bank can put out, and then import it using Xero API I guess or possibly even just putting it into CSV that Xero likes.

So, I just have a quick look at a couple of these things. [Inaudible] looked through others to show you. One was from a guy called Daniels, a guy who’s in Scotland. And he was learning to drive. And he wanted to know when a slot opened at the UK equivalent of the DMV to get his practical driving test. So, we do a driving test that’s on paper and then we do actually going out in the car and doing stuff. And they’re on totally different occasions. You have to pass the theory before you do the practical. But getting an appointment can be tricky. So, he created this script [inaudible]. And literally, all it does is it uses Mechanize in fact. Let’s bring it up. So, here’s the Mechanize script which I’ll, again I’ll increase the font size. It literally goes to the UK government site, looks to see what dates and times are available and there you go. It’s really simple stuff. And then once it found a time, it would then send him a text message using Twilio just to, so he would know all about that.

And one that I found really… [inaudible] pointed me to this blog post, which I don’t remember reading at the time, but it’s from 2013. but it’s from 2013. This guy called Andy Jiang, he uses or used rather IFTTT, so that’s ‘if this then that’, Mandrill, the email system, and Twilio to find an apartment in San Francisco. What he did was he used ‘if this then that’ to look for new Craigslist listing for an apartment, that was the rough thing that he wanted. So, certain number of square feet and in the location he wanted, the price and everything. He then had that, he had a recipe that was [inaudible] him.

So, he set up Mandrill to handle incoming email. It would pass the email, it would work out if there was a phone number associated with the thing by looking at the format. He would have a regex that, is there a phone number in here, that’s within the San Francisco area? And then he would actually use Twilio. And if you look at the code here, he would actually have it call the lead person on the listing and call him and then connect the call together. So literally, as soon as someone put an ad that suited him on Craigslist, it would get emailed into his program and it would use Twilio to connect them in a call. So, now that would be creepy if I’d just put a listing up and I immediately got a call. But I guess that’s how it goes in San Francisco. It’s that kind of place for renting an apartment. So, he did that and eventually [inaudible] an apartment using that technique. So, hats off to him. I think it was [inaudible] splendid.

So, that’s actually pretty much it for me. Just before I conclude, I need to go through a few extra slides which don’t actually mean anything because I got my crazy director here Jess to put together some slides. But I didn’t end up actually using any of them. So, I just thought, as long as they get [inaudible] here, then her hard work wasn’t in vain. And so, I [inaudible] to ask you to subscribe to all my things. You can just go and find Ruby Weekly, JavaScript Weekly, [Inaudible] Weekly, all the cool stuff. You’ll have a good time. So yeah, that’s pretty much it. So, Ruby is basically my Swiss army knife. I use it for everything. I can’t use a real Swiss army knife to save my life. But I’m constantly using Ruby as a tool to get me out of boring, manual copy and paste jobs, scraping information, and just even the formation of my business and my whole idea of what now makes me all of my salary and everything.

So, I think you can do it as well. Just don’t be afraid of using Ruby in this scrappy, messy way. And don’t give a [inaudible] monkeys if anyone looks at your code. It’s not important. What’s important are the results. So yeah, thank you very much. And I am more than willing to take [questions] if you have any. And obviously I just want to thank Chuck as well for putting on this awesome event as well. So, [inaudible].

I already have a first question. So yeah, Doug’s asking me am I familiar with the DATA constant in Ruby? Yeah, that’s the whole thing where you can read the… like if you put the, what’s it, the end thing on, isn’t it? And then you can use, read the file, read the data out of the current file that you’re in. So, here you go. He’s linked to an example. Yeah, so you use double underscore end, and then ‘data dot read’. Yeah, I’ve used it before. I just don’t tend to use it. I don’t know why. I [inaudible] it fully makes sense, actually with some of the situations where I was using the A equals, just to give you an example. If I just live refactor this, let’s do this just to see whether I’ve still got any skills whatsoever. Here we go. End. Stick that information down the bottom there. And then I’ll say, A equals data dot read. If I run that, there we go. So, yeah, it runs in exactly the same way. I’ve just noticed. The screen doesn’t seem to be updating. At least it’s not on the widget I’ve got here.

CHUCK:  It’s updating fine for us.

PETER:  Okay, cool. Okay, so cool. So yeah, that’s how you’d use what Doug asked about. So yeah, awesome point. I should really start using that. It’s like many things in Ruby. I just forget about things and then unless someone reminds me I don’t start using it. So, thank you very much, Doug. Anyone else? I know this is not the sort of talk that really yields many questions, because it’s really me just screwing about. So, I shall sit here for a couple of minutes and see what you have to say. But yeah, I just want to thank you all for coming and listening to this as well. So, I’m literally just about to leave and head on a four-hour drive to London. So, it was nice to have a nice, pleasant talk to give, not anything too complicated.

CHUCK:  On your scraping, I’m wondering a little bit about if they use JavaScript to place stuff, then your CSS selector wouldn’t work. Or does SelectorGadget account for that?

PETER:  I believe SelectorGadget can cope with it. The problem that you would have is obviously when you do this read. Obviously, doing that isn’t reading… it might read embedded JavaScript. It’s not going to run it.

CHUCK:  Right.

PETER:  Yeah, so what you can do in that case is, I can’t remember what the Ruby equivalent to call it now, but like PhantomJS type setups where it’s… you can do it with Capybara can’t you, when you do testing where you load a page and have it run the JavaScript and then it returns that into Ruby. I can’t remember the name of the one for Ruby specifically. But there are things that will read and run JavaScript and then allow you to work with it. So, you could use something like that. But yeah, it’s very rare I work with scraping things like that. But it is definitely possible.

So yeah, Josiah’s asked are there legal issues if you are scraping? I’ve probably given a good impression that [inaudible] doesn’t really care a lot for formalities. So, yeah there might be. I don’t know. But if I’m literally reading out a bunch of things that is like… I mean, I’m never scraping anything that’s someone’s put hard time and effort into doing and it’s copyright or whatever. It’s literally things like lists of names or words or geographic information, stuff like that. So, I don’t really care too much. But, you know. Yeah, I would probably give a legal disclaimer that’s yeah, when you’re scraping just be careful. Because some people really don’t like it if you’re going to keep hitting their site. And I can’t obviously have that with the DPD. I didn’t’ want to hit their site every 10 seconds for the location because they may have just tried to block me or whatever. So, you always got to be careful when you do that type of scraping.

And yeah, Chuck says the company publishing owns the copyright for what they produce, which I guess is kind of true. I guess the only reason I will do it is if I need to, in certain cases, is that… in many cases data isn’t copyright. It’s a very complicated issue. But if it’s like a list of English place names, you can’t actually copyright that. At least that’s true for the public information here. But there’s tons of data sets out there and things like that. Yeah, just be careful. Just don’t go and scrape someone’s news site and republish it as your own news or something. Be careful kids.

CHUCK:  Alright.

PETER:  Cool.

CHUCK:  Well, thank you, Peter.

PETER:  Yeah, thank you [inaudible].

CHUCK:  If you want to go out and write some scrappy code.

PETER:  Yeah, just go and write tons of scrappy code. That’s the way to go.

CHUCK:  Alright. Well…

PETER:  Yeah, I’ve given you permission.

CHUCK:  [Chuckles] There you go. You can put Peter’s stamp of approval on your scrappy code.

PETER:  Yeah, I think that’s actually what a lot of it’s about. Just get permission. Go and do it.

CHUCK:  Totally.

[Once again this episode is sponsored by Braintree. Go check them out at BraintreePayments.com/RubyRogues. If you need any kind of credit card processing or payment processing in general, they are a great way to go and we appreciate them sponsoring the show.]

[Hosting and bandwidth provided by the Blue Box Group. Check them out at BlueBox.net.] 

[Bandwidth for this segment is provided by CacheFly, the world’s fastest CDN. Deliver your content fast with CacheFly. Visit CacheFly.com to learn more.]

[Would you like to join a conversation with the Rogues and their guests? Want to support the show? We have a forum that allows you to join the conversation and support the show at the same time. You can sign up at RubyRogues.com/Parley.]

x