076 RR Service-Oriented Design with Paul Dix
- Published on:
- October 24, 2012
The Rogues chat with Paul Dix about service-oriented design.
JOSH: Paul, can you give us spoilers for the rest of the book?
PAUL: Er… let’s see. I think I actually have to look at the table of contents.
AVDI: Do they finally get together at the end?
PAUL: Well, they go to the top of this big fiery volcano like thing and they throw the ring in.
JOSH: And it turns out that they are secretly brother and sister.
[Hosting and bandwidth provided by the Blue Box Group. Check them out at bluebox.net]
[This episode is sponsored by JetBrains, makers of RubyMine. If you like having an IDE that provides great inline debugging tools, built in version control and intelligent code insight and refactorings, check out RubyMine by going to jetbrains.com/ruby]
[This podcast is sponsored by New Relic. To track and optimize your application performance, go to rubyrogues.com/newrelic]
CHUCK: Hey everybody and welcome to episode 76 of the Ruby Rogues podcast. This week on our panel, we have Avdi Grimm.
AVDI: Hi, I’m Avdi. Head chef at rubytapas.com
CHUCK: We also have Josh Susser.
JOSH: Hey good morning everyone. Every day is an adventure!
CHUCK: I’m Charles Max Wood from devchat.tv and this week we have a special guest and that is Paul Dix.
PAUL: Hi everybody I’m Paul Dix. I am the co-founder Errplane and also the author of the book for this week.
CHUCK: Awesome. So if you didn’t know, we are doing a book club this week, we are going to be talking about “Service Oriented Architectures” and I think the full title is “with Ruby and Rails”. So let’s get started. First, definition.
CHUCK: Actually before we get started, do one of you guys want to talk about the Best of Parley?
JOSH: The best of parley, there was so much stuff on Parley this week.
AVDI: There was some really good stuff. The funny thing is, like the first thing that comes to my mind was a tangent. There was a really interesting conversation about– I forget what now, but somebody posted some really nicely formatted source code in their email. And that led to some questions about how they did that and it turns out that there’s a browser plug in or a browser extension that lets you easily type mark down into your email. So, that is like the first—
JOSH: Chris Hunt’s reply about how we would do date formatting.
AVDI: Oh, yeah. Right.
JOSH: Yes. That was a pretty impressively formatted email.
AVDI: That was like the first thing that that sprang into my mind. Also, there has been an interesting thread about what people did for RailsRumble.
JOSH: Oh, yeah some cool stuff there.
AVDI: And yeah there’s been a bunch of cool stuff.
JOSH: OK, so for people who are just tuning in, the Parley is the Ruby Rogues listener’s email list. And it’s a way to pass $10/year to support the show, and also, to get on a private forum where you can talk spam free with the Rogues and others of our listeners. So, that’s it for our plug. Rubyrogues.com, go sign up.
AVDI: Maglev and Object Prevalence stuff, that was a cool thread too. Yeah, it’s a pretty good week.
CHUCK: Yup. Alright well let’s get into the topic then and talk about Service Oriented Architectures. So, somebody was calling for a definition and I will let our definition master take it away.
JOSH: Well, I liked that in the book, Paul distinguished between calling something “service oriented architecture” versus—and why he use the term “service oriented design” in the title of the book. And that service oriented architecture has been polluted with the association with Soap, Java and xml and all those things. So, rather than steal Paul’s, I’m going to toss this at Paul and ask for definition of “service orientation”.
PAUL: Yeah. When I talked about service oriented design, it’s really about the idea of taking a large complex application which can consist of many different parts. And you know, typically Rails, you have this monolithic application where everything is contained in the same code base. You are talking about your models, your views, your controllers and even stuff like your workers that do background processing.
And the idea around service oriented design is to take long parts that may have different properties to them in terms of scalability or APIs that they need to provide, and separating them out; like pulling them out into a different code base, possibly a different platform, a different language and they can use different underlying storage systems as well.
JOSH: I remember hearing about service oriented architecture like 5 years ago. So, it’s definitely not brand spanking new. People have been doing this for a while. How far back does this sort of thing go?
PAUL: To be honest, I don’t know. Service oriented architecture as a phrase, probably I’m guessing goes back to the 90’s, but the idea of splitting complex systems out into separate pieces that have some sort of communication channel that allows them to synch up to each other or work together, that’s obviously something that’s been around for a long time. And really when I talk about service oriented design, I’m talking about that overall concept of the splitting things out and providing a layer of abstraction around some underlying complexity. And then presenting some sort of interface or API, either through web services or thought messaging system or whatever the thing maybe. Whereas I felt like service oriented architecture was too specifically tied to this idea of XML based web services that will usually Soap or something like that.
JOSH: Right. Yeah. So, I think a lot of people who have built Rails applications have done some level of service oriented design in their application. And I think there’s two places where people usually get started doing that; one is doing a background job for sending emails and the other is hooking in Solr to do text searching.
PAUL: So, the email thing is funny to me because I guess I’m not sure if that counts as service oriented design. If you are talking about using Mail Gun or something like that or external email service provider, then its service oriented design, but you are not the one doing the design. Like, you are just interfacing with another service. The background workers themselves, in my mind, if you have that code inside your Rails application code base and its using the Rails models, even if its running in a separate process, that’s not service oriented design. That is still a part of the whole thing.
Solr, I would agree completely that is a service oriented design because then you are talking about you have this separate process running on maybe a separated system, but it provides a clear service that you are accessing. So, Solr’s one like Amazon simple email services is another or Mailgun or whatever. But the idea that I put in the book is about designing your own services for specific business logic or things that you are doing. So, it just so happens to like texting, full text search and setting emails and stuff like Twilio sending text messages. Those are things that lend themselves well to services and also for email and text messaging lend themselves well to third party services.
JOSH: I should probably mention S3 the thing that probably everybody does. [laughs] You’d count S3 as that kind of service too?
PAUL: Absolutely. I think Amazon’s model with Amazon web services was a very big inspiration for me as I was writing the book, and also their model of how they approach things internally. I use it as an example how you know, when Amazon first started, they have this monolithic application and they switched over to like this weird two tier architecture with like database and all those other stuff. And then, at some point, their code base was getting so large, and it was getting so hard to like deploy new features and get anything done, that they knew they had to move to something that brought these things out into different pieces, but they weren’t sure how to make that move.
Essentially, I’ve heard — speak on this and say like they basically laid out mandate that said, “All new code that’s being produced is going to be a service.” which means it’s going to be runs independently and it can be network accessible. And then, they just went down this road of like bringing things out so that you know, like recommender system is just the specific service that you can call out to say, “get me recommendations for this user or for this item”, you know, the catalogue service or the actual cart checkout service. And then much later, they branched out to actual the Amazon web services which they provide out to everybody else.
CHUCK: Right. So, one thing that I’ve run in to talking to people about this and I actually gave a talk about this last week, about service oriented architectures but, one thing that people asked about is latency between the services. So, for example, if you have all of your code being handled on the same server within the same system, then its obviously faster than calling out to a service to get information and then coming back in and providing it to the user. What do you usually tell people to address that concern?
PAUL: So, with that, I think there are few different levels. If you have the situation where one service calls another service, calls another service, and you have that all in the request to response pipeline then you know, that is something you want to avoid, right?
AVDI: You have a term for that, right?
PAUL: What’s that?
AVDI: You have a term for that in the book, right?
PAUL: I mean it was really just like the depth, the call-depth.
AVDI: Call-depth, yeah.
PAUL: Essentially, you want to avoid having a large call-depth because even if every service is like really fast or turning in under 10 milliseconds, it all adds up. So that’s one thing that you have to look out for. But at the same time, I don’t necessarily think that in all cases it’s faster just to do it on a single machine because—caching is a perfect example, right? In-memory caching. It’s much more efficient to have ten computers that are providing the caching service than to just put everything on disk and have it all fed from a single machine. So, even in that case, you are talking about network latency is still better than disk read latency.
CHUCK: Right. That makes sense. One thing that I wanna point out with your call-depth is that, most of the time when you re splitting things up into an SOA, one of the things you are trying to solve is lowering the complexity of your code. And so if you have a service that calls into a service, that calls into a service, and calls into a service, you are probably not necessarily solving that particular problem.
PAUL: Yeah. I definitely agree with that. I think as you are designing something, if you re designing service oriented architecture, it’s good to have like a higher level view of where things are going. And if you have like this crazy spaghetti of all these errors pointing in different directions, you may need to rethink how you are doing it. [laughs]
JOSH: [laughs] I was think about this as reading the book that one of the things we talked about, I’ve heard talked about a lot in SOA is that, you can start transitioning a monolithic application to a service oriented design by taking just pieces of your code in your application and putting them behind some kind of interface, some abstraction. And then, after all the access to that goes through this interface, you can move the implementation of it to a service.
PAUL: Yeah, that’s right. And actually, I think in the very beginning of the book, I actually say, when you are starting off a new application, you should avoid service oriented design because it adds complexity and it adds development time to getting things done. When you are starting off a new application, you should avoid service design because it adds complexity and it adds a development time to getting things done. If you are starting to get application out, you generally don’t know what features are going to be kept and which ones are going to be thrown away. So really, the thing that you optimize for the most is iteration speed. The services, for the most part reduce the iteration speed. What I found it’s not the case is if you are talking about very large teams that need to coordinate because when you get to a very large team, like the communication overhead is too high. So, to figure out some way that like logically break things down and then the other thing is a very complex, very large code bases. It’s easier to think about it if you have these abstractions.
So, I think background work is generally like one of the first things that people look to as far as like separating out into service, right? If you have some sort of like application that you are building that has to go out and update and fetch from a bunch of external feeds, like when you first start this out, you might do this just as rescue workers that are accessing your models directly on all those other stuff. And that’s great that will get you pretty far. But then, you get to the stage where the complexity behind those rescue workers is getting greater and greater and the logic behind one day update, having the update is getting worst. And also you find that maybe the SQL database that you are using for your entire app isn’t the appropriate place to store that raw data or process it, then you think, OK that’s the time you wanna say, “Maybe we should separate it out into a service.” and provide an clear API where you can say like, “OK. Stop crawling this thing or update this thing or do whatever this is” and then underneath the scenes or under the covers, you know, it’s doing all sorts of different things probably using messaging system and SQL database to store the raw data or iterate over it. But the point is, all those things are just concealed behind a very clear API that you provide.
JOSH: OK, that makes total sense. There are like 8 things to talk about in your description there, but the thing that got me off on that track was that, when you put all those stuff behind the interface and then you can really tell if it’s a part of your local code base or remote service, does it take extra effort in the code to be able to basically make allowances for the extra latency in talking to the remote service?
PAUL: Generally, I would say it does. You know, they are essential when you are talking about remote service you have two types; one is synchronous which is it needs to make a call and it needs to get some sort of response, so that you can return something to the user that is making the request. And of course the other is asynchronous. So, that’s why I say background things are usually like prime candidates for the first things that you want to put into a service because they are usually asynchronous; which means you can just kick it off. And as long as there are some call that you can make for that service, in a synchronous fashion, they’ll tell you the status of that background thing, then you’re good. And I would definitely say that when you are pulling things out in the services but anything that is synchronous, part of the contract of the service is not just the API that you provide, but the uptime you are providing and the guarantees around how quickly you’ll send the response.
AVDI: Right, it makes sense.
CHUCK: So, I have a question. It seems like in some cases, people start talking about SOA when their application reaches a certain size. So you know, they start out, they got this little app that does something that is pretty easy to manage, to keep in your head. And then you move up to kind of medium size application where you know, it’s still generally easy enough for your team keep track of everything that’s going on. And then they kind of reach that large or monolithic stage where it’s like, “Okay, now I’ve got to go and figure out a whole bunch of stuff before I can add a feature”. It seems like that’s usually when they get in and to SOA. Is there a good time to do that? Is there a good way of gauging that? Or is it just when you see something that is easily or simply split off into a service, because it has separate concerns and separate functionality, that you split it off?
PAUL: I would say generally the good time to do that is when the pain becomes so great [laughs] that you have no choice but to start splitting things off. I mean, it’s great for some things when you say, “OK this is definitely something that can be self-contained. There is no reason for it to be part of the rest of the code base.” And you know, the example that Josh provided, Solr like full text search, that’s perfect example of something that is easy to say it should be totally separate.
But I mean the other thing is, I’m a real fan of only making services around long lived features. So, I find that, I had a bunch of friends in start-ups where they are building this Rails code base and they start developing all these features and their code base gets just bigger and bigger. And the test takes longer and longer to run, but it still doesn’t make sense to break it out of the services. Because one of the problems is like half of the features aren’t even getting used. And it’s like why bother spending time to like reengineer those things if they are not getting used? So, I like taking things core to your business or to what you are doing, and you know that’s not going to change and pulling that out into your service. And then you can like highly optimize it and just make it really, really rock solid.
JOSH: Sounds good.
CHUCK: One other question that I have is, I mean, when I’ve done SOA, it seems like for the most part, I’m using Rails on the front end and then the majority of the services I build in the back end of it, if I’m doing like synchronous HTTP calls, I’m using Sinatra. Is that generally how you approach things or do you use different types of technology for the different layers?
CHUCK: That makes sense.
AVDI: So, there is something I wanna talk about a little bit, which is deciding how to partition services like where to draw the line between services. You talked a bit about that in the book. You mentioned few rules if I recall correctly, looking at for instance, if you have a lot of reads and few writes so if you divide things up you have different services handling those. So that the bulk of those– so there’s occasional rights on interfering with the performance of all the reads, you have some other strategies. Now, something that I’ve seen a few times in projects that have tried to go kind of service oriented is they’ll have, basically layers I guess you can look at it as.
And so, like I worked on a project where there was a website, there was a command line executable that talk to the services, there was an API that front end and back end and there were various workers. And what we spend a lot of time doing was we’d add feature x and we’d add a way to access feature x in the command line executable, and we ‘d add something to the website for feature x, and then we’d have to add a new call for API for feature x and then we’d have to add a new thing on the back end for feature x. What was the basically happening is that we are changing every service or every piece of the architecture every time we added a new feature. And Uncle Bob on 8thlight blog had a good article about this a while back, sort of noting the same thing and basically saying, “when you divide up your services that way, it’s a violation of single responsibility principle.” The principle is keeping things that change together, together. And keeping things that and so only if you have a bunch of things that change together, find a way to put them all together sort of sliced differently so they are still all together. And I’m curious if you have any insight into this, because it seems like some of the splitting strategies based on like load and stuff like that, would wind up with having that issue of having to change multiple services whenever you add a new feature.
PAUL: Yeah. I mean you can’t segment based off load alone. Usually, I think like a logical function which is the idea of keep things that change together, together. The reason I singled out load is because I personally have the experience where I’m building applications from scratch, and I have some specific thing, like some specific type of data that is going to get written in. And this is the time to example I was talking about earlier, where I know I’m going to be writing a ton of time series data and I need some API for that and it needs to backed by something else. Now, the meta data around that time series data can change all over the place, but it’s not going to change the core API of writing in time series data. If it is going to change the query API, then you need to rethink how the API is designed.
To the point about if you change a service and then you have to go through and you have to update the command line utility and you have to update the other web thing, I think that is a separate issue. I don’t think there’s any way for that go away. I think that’s going to happen like for example mobile applications, you have the native iOS app, you have a web mobile app, you have the Rails app and you maybe a native Android app and then you have maybe a command line client. If you add a new feature to a service, you’re going to have to go through an update all those service clients to either use it or not. Well, if they don’t use it, then ideally you wouldn’t have to update anything but if they do want to use it, you are obviously going to have to update them. And the way you update them is going to be different for each one. I almost feel like, I don’t wanna go the route of like I trust these or anything like that or was it like web service definition language?
AVDI: So one of the things I liked about the book is in your REST examples, you do encourage (to some degree) you encourage discoverable services. So, you have URLs going into the responses and you actively discourage having clients that have to construct URLs. So you are definitely talking about– you are not talking about like — or any kind of schemas but you are talking about having resources that describe themselves and describe how to get to get to other sources or even to other states, which is pretty cool.
JOSH: But I notice you didn’t really talk about ActiveResource do you have anything you wanna say about ActiveResources and technology choice in doing these services?
PAUL: I guess at the time, I don’t remember– like I looked at activeresource and it just seem to me that it was trying to prove like in a low-level model abstraction where you just call these things but in my mind, like I said, I don’t feel as though the Rails REST API is a legitimate way to design like a service that you want to create. A lot of times there are things that you want to include automatically. You don’t wanna make a service call every time you want some individual piece of data. I basically like I just found that activeresource didn’t feel right. And I also didn’t think performance of it was very good.
AVDI: It seemed to me that activeresource is trying to make a really fine grained API. You know, basically make it look like very similar to the calls and queries and things that you would make to a database. And I don’t know how you feel about it but this, but I think one of the strengths of RESTful SOA has been the fact that is kind of enforced high granularity or no, low granularity. You know, services that are not super, super granular. Because I think one of the bad roads like the whole — community went down back in the day was, “we are going to expose all of our objects, all these remote objects individual tiny remote object as if they local objects” and that’s just a terribly leaky abstraction. You can’t pretend that you can interact with hundreds of remote objects– individual records in a way that’s performant and behaves exactly like they are just local. So, I kind of like the fact that these RESTful services larger, sort of larger requests, bigger requests that don’t often more often to bundle more things together.
PAUL: Yeah. I agree with that completely. I think that, including like the performance concerns about making method calls or whatever, I think it’s important when designing a service to say like, “This is the actual API we want to expose.” I mean, I guess you can say it like, “We are going to write an object and you can call those methods on the object”, but yeah it definitely strikes me as a bit of a leaky abstraction.
AVDI: Plus it couples, you know, the more granular you are– your API is, the more your clients are going to be coupled to your exact architecture right now. So you know, the activeresource approach to that when you are basically exposing your records– exposing your resources exactly as they stand in the application, that is saying that I expect that this inner structure is going to stay the same forever.
JOSH: Let’s see, so what else? We’re talking around before about sort of the– so this book has been out for like 2 years or more now?
PAUL: Yeah, that’s right. It’s was released in August of 2010.
JOSH: Yeah and so it was it was quite timely and think if you just look at the talks that people give at conferences these days, so much of what they talk about is service oriented design. We had two talks at GoGaRuCo this year about that. And so, it’s definitely a topical subject [laughs] or current subject that people are very interested in it.
CHUCK: I can just attest to that really quickly because when I spoke at Aloha Ruby Conference, I had probably a half of dozen or more people right after my talk come up and say, “This is pain that we have right now”. And so, it’s definitely something that people are running into and trying to solve. Especially with some of the legacy apps, where this concept wasn’t something that they’d even considered for
JOSH: Right. So 2 years, that’s a really long time in internet technology. We’re talking on the pre call, you were saying stuff has changed since then you have more experience, what is it that you’ve learned that you didn’t get to put in the book because you didn’t know it then?
CHUCK: Yeah how are we doing it wrong now?
JOSH: Yeah. [laughs] What’s the new hotness?
PAUL: One of the things is that, I have a single chapter in the book about massaging. And it covers specifically RabbitMQ, which I’m not really using Rabbit anymore, I’ve moved on to other technologies. But in my mind, messaging is so much more of the core about service oriented architecture, than I give it credit for in the book. And really that stems from my experience working in a couple of very large organizations that have a lot of people. And the thing is like, when you come in and you have this small team and you just wanna like quickly build something, one of the biggest pain points you’ll have is trying the interface with other teams or trying to get the data or something like that. I feel like designing a proper messaging architecture and proper data flow architecture will kind of ease the pain on a lot of these things.
For example, if you just had a policy where you said anything that happens in the application, either a user directed event or a model update or something, anything that happens at all have to be event that gets written to a messaging system that anybody can read off of. And furthermore, there are some other things that aggregate all those things off the messaging system and dumps them into like the canonical store. Which could be like a deal for something like that, but the point is it’s not the active store, it’s just a store that is available so that other people can get with the data later on and build services on top of it.
That’s one of the things that I would talk a bit more about is this idea that, a lot for times when you are building a services, you are building it after the fact. Like the features is already developed. Its already out there and what you are trying to do is the way to abstract it. And one of the key hurdles you have very beginning in doing that is getting with the data and making it accessible in a sense of a fashion. And if you have a messaging system like that, you can create that service and how they update in real time with all the data that is coming through, without affecting anybody else in the architecture and that way you can kind of slowly transition onto using it.
JOSH: That’s a pretty cool concept. I’m mean, have you built a system that way or wish you built a system that way?
PAUL: I definitely wish I built a system that way. I’m trying to get in to the habit for new systems of doing things like that. It’s kind of hard in the beginning to do that because then you are talking about quite a bit of extra architecture but yeah.[laughs]
CHUCK: Yeah, but like you were saying before, you just kind of start out with the simple case and then work your way into the more complicated case. Unless you have large team like you are saying where but then you can have a couple of people just figuring out that infrastructure and architecture while everybody else builds awesome features.
PAUL: Yeah and I mean the approach I always advocate is start it out simple. Don’t get crazy with the architecture because the chances are you are just going to throw it away. So, there’s that. I guess like in the book, I have quite a bit of focus on the specific code examples, like there’s Sinatra and there’s Rails. Rails 3 I think was in beta at the time, barely in beta at the time I was writing it, so there are Rails 2 and Rails 3 examples. And if you have those examples that focus on like a framework or whatever, don’t age very gracefully. I think there’s room to talk a lot more about design and bring in more specific examples and case studies about, “OK here’s how you break things up.” instead on focusing as much on specific code to create the thing.
JOSH: One of the things that I was thinking about as I was reading the book was Rails I think did an amazing job when it first appeared in the scene. It did this amazing job of making database backed web applications really easy to do, because it just abstract it the way a lot of that stuff and it just gives you really good libraries and support tools for building these things. And people have been building database backed web applications for a while at that point, but suddenly it was a game changer. A lot more people could do that thing.
And the service oriented design that you talk about in your book and that a lot of people are trying these days, it seems like we are in the same sort of time period of pre-Rails where everybody is doing it or everybody is learning what are the right ways to do it. And have you thought about trying to extract this learning and put it in some kind of framework that makes it accessible to a lot more people?
PAUL: I totally agree that it feels like we are in the pre-Rails days of this kind of thing. I think there is that’s what I’m talking about when I’m thinking about what the next wave is going to be. Like, what is the next Rails? And I think it’s going to be a framework that makes creating service backed applications trivially easy. And its seems to me like the people that are leading the way on that for the last year or two, have been people who are creating mobile applications, right? But the thing is like all of the proprietary like stat like — and all those other ones. I’m really interested in what’s going to become of open frameworks that really focus on this idea of creating a service back applications, where the applications itself could be three applications, right? Different mobile platforms or a web app or whatever.
JOSH: OK. But you don’t know of anybody who is working on that or someone who would like to go off do that?
PAUL: Sadly, I’m unavailable for doing that. But I think, you know, I’m not sure– I’m curious about what MeteorJS is doing. I mean they are doing the real time thing, but it seems to me to make sense like they are already committed to be an open source thing and it seems like they would be in a position to actually create a framework like that. I’m a little sad because honestly, I think the community that is most likely to come up with something like this at this point is the Node community.
JOSH: I was expecting that you say that. [laughs]
CHUCK: This podcast and my other podcast are going to have a fight now.
JOSH: Cool. OK, so Paul, I have one or two little questions from the book. So, all of your stuff is built around HTTP and pretty much REST and I noticed that you used the status code 400 to indicate sort of the generic error condition for request. And it seems like people doing RESTFUL servers or services in Rails that mostly they use 422 as the code. And I’m looking this up and I think the code indicates what 400 is about bad syntax and 422 is about bad semantics. Do you still do 400 now in your services? Do you think that is a good choice? How does that compare with the 422 or is there just—is this a niggling detail and I should shut up?
PAUL: I think I–
So the thing is, I actually– when it comes to RESTful design, I am not very religious or dogmatic about it. And I feel the same way about testing and all these other things which is, I try to be as pragmatic as possible and if a 400 works then it works. I don’t care about whether it’s a 422 or not, like if people think that makes more sense that’s what I use. I actually hadn’t seen that by the time I was writing so if that’s what people moving towards as their standard for writing RESTful services, that’s definitely what I would use forward but I hadn’t so far. So far, it’s still been using like a 400 is just like a generic thing.
Actually I did wanna loop back real quick to things I would update about the book, because it reminded me of when you said like most of it is focus on REST and HTTP services and all this other stuff. And I would say, if I go through it again I would probably say for internal services, don’t be afraid of actually creating you know, a regular like TCP based service with a line protocol. Don’t be afraid of actually defining protocol. Because in the end, when you are talking about RESTful services, you still defining protocol; the only difference is its defined through the URI and the data that you pass back and forth.
I guess one thing that I would become interested in I’d say about a year ago for doing that, for saying, “OK, I’m going to do services, but I’m not going to make them HTTP REST based services.” I looked into ZeroMQ which I still find pretty interesting, but I’m not sure if it’s totally the way to go. But with my Go stuff, recently I’ve be programming in Go for that, I’ve been doing like line protocol based services. When I found that it’s just a lot lower overhead and I’m just defining everything very clearly from day one.
JOSH: But they are still TCP based?
PAUL: That’s correct. I mean these are for services that are only like internal in the architecture. For anything, that’s exposed to external clients, like 3rd parties, I’d still make those HTTP based services.
JOSH: OK you haven’t hit the point where you need to do a UDP based service yet.
PAUL: I put something up briefly. It was for time series based data, where it’s like you don’t really care. Like, you just trying to get the data in aggregate. [laughs] So I’ve done that for things like, where you are doing log aggregation or a bunch of metrics that you are tracking and you don’t care about– like it’s okay when something get lost.
AVDI: So Josh mentioned error returns and I liked the little point you made about including error codes. Including application—like you got the 400 error or 422 whichever, that comes back with internal for each exception you have, you have an actual error code. Can you explain why to do that?
PAUL: Yeah. For that, I found that– I actually started doing it initially, because guys were programming in statically typed languages and accessing my services wanted that, because I wanted to know what particular piece of business logic wasn’t met that caused the error. And I wanted to do a match based on a code I supposed to some human readable string or whatever. And I just carried it forward from there and I thought, “OK. Well, if there is like a well-defined case that I know about, it’s kind of like in enom or whatever, why not provide an actual code that other programmers can match against.”
AVDI: Yeah, which is a lot more reliable than against an error message that might change.
PAUL: Yeah exactly.
JOSH: I have totally different tangent here. I’ve been wanting to bring this up the whole show. And that is are you familiar with Conway’s Law?
OK so Conway’s law, it comes from 60s and it says that “Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations.” And the most clever rephrase of that that I heard is that “A group of 4 teams will produce a four pass compiler.”
CHUCK: A four pass compiler?
JOSH: Yeah. [laughs] If you have a team doing a compiler and there’s four teams on it, then you get a four pass compiler. So, I know that in the book you talked a lot about agility and the development process and how teams can be structured to work better on service oriented structured applications. How big of deal is to deal with when you are taking your developers who are used to working around one big application and breaking them into smaller teams that can work on smaller components and services that talk to each other? How much of a challenge is that? And how much do people have to plan for that?
PAUL: Honestly, I think the challenge is mainly a top down challenge, which means I think the only people who have the power to affect change are the people who actually set the team structures and they would have to enforce it. So, we are talking about the CEO of the company or somebody fairly high up.
So before, I get to that, and we talk about the first point which is Conway’s Law, if you have four teams you are going to have four systems, four services. That is one of the reasons I talk about long live services, things that are fairly well defined. Because ideally, if it’s something that is going to be around for a very long time and its fairly static, that’s another indication that that’s a good service. Like that is a good split, is if you have this set of functionality that’s fairly static. It’s not going to change. And that’s the services that you can create and at that point, it moves in to operations mode. So, it becomes entirely feasible to have more services than you have teams, right?
Although generally, I would agree with the idea that like the architecture of the system tends to reflect the political landscape of an organization. So, the things about people into teams is that, become fairly opinionated about my hate for large teams. I’ve worked on teams as small as just me or teams that are 20 or more contributing the same thing. I guess the largest that I ever worked on is 2,000 but they weren’t all contributing to the same code base. But, I find that as the team size grow, and if you have all the people in the exact same code base contributing it just becomes really, really hard to get things done. Like, to create new features and deploy them and a lot of people have come up with ways to kind of use that pain. Generally they are around really like hard core process, like nothing goes in unless there’s test suite and test written around it. In that way, you know when you deploy something new when you are working on a new feature, you need to be sure that it’s going to work because otherwise the test that are going to break.
But, it still doesn’t help you with the fact that you may have thousands of source files and hundreds of thousands of lines of codes and you have to jump around all over the place to figure out what’s doing what. So, I become a fan of this idea of, you have to like, politically enforce a max team of six. And I think the ideal team size is somewhere around 6 to 8.
JOSH: Yeah. I agree with that.
PAUL: And I also believe that those teams have to be cross functional, which means that they should be able to deliver a feature, ideally a user customer facing feature from start to finish without having to pull in a separate team. That’s not always the case if you are talking about infrastructure teams, but I think you have to get to a fairly large level before you can talk about actually having infrastructure teams who in that case, then they still are cross functional but your customer is other people in the organization that are going to use your infrastructure.
AVDI: So, you are keeping the team’s call-depth flow well?
PAUL: Yeah, [laughs] I mean it’s.
CHUCK: It’s almost like somebody said that the structure of the program mimics the structure of the teams. So you keep the team called up. Yeah, no never mind.
PAUL: No, it’s less about that. Its more about a team is a full mesh network and you have a limited amount of bandwidth for everybody to communicate. So keeping the team size small is good.
And the other is, a single programmer can only write so much code. So if you keep the team size small, you are not going to get single code bases that are insanely massive. Like, I would rather have– like the Ruby gem’s ecosystem is a great example, right? When you go write a Rails program or whatever, you’re actually dealing with god knows how many lines of code. But, you don’t really have to worry about all that because its abstracted out in separate gems written by separate teams and you just have to worry about the interfaces in to those, for the most past unless you find with the libraries you using. But I like that idea of separating things out and having taking these different and putting together to build something else.
JOSH: Have you considered a career with masonry?
PAUL: Yeah. That is actually my fall-back. And I’m not so sure this whole programmer thing is going to work out. I think that may be a flash in the pan, so masonry is high on the list.
JOSH: Hey, you know Brick Wallace they’ve been around for ages.
CHUCK: You keep your masonry teams though, too? So I have another tangent that I wanna go off on. Mainly because I think that when most people talk about SOA, they talk about the organization and they don’t talk about security. And so I was wondering what your approach was to security is as far as, you know, most of the queuing systems– well I take that back. If you are using RabbitMQ or some of these others, you can set up some kind of authentication around who can put stuff on the queue. But as far as doing synchronous call and things, what techniques do you typically use in order to make sure that the request is actually a legitimate request from an authorized user or whatever.
PAUL: Right. So, security is a sensitive topic because it’s obviously not going to be the same for every organization, right? But generally speaking, I think there are two types of security that we are talking about. There’s customer facing security and then there is the internal security. Now, generally, web start-ups generally don’t have to worry about internal security because you are hiring somebody and if you are hiring somebody, you trust they aren’t going to take the data inside the system and use it in some you know, in some bad fashion. Obviously, that doesn’t work if you are talking about the defense contractors or a lot of times, people in the finance industry they have heavily bits of data that other teams aren’t able to see. And they have that fore regulatory reasons as well.
So, my view on security kind of boils down in those organizations, which is also generally why I don’t try to work with them very often. [laughs] But, the internal stuff, I just like to say, keep open to the developers. Trust that the people you hire aren’t criminals and try and get stuff done. And if you did hire a criminal, then that is unfortunate and you have to fire them and do what you have to do. But, when you are talking about customer facing security, so you have to use a system lets say the customer owns their data and they are able to control in a very granular fashion who is able to see it, then the only thing you need to make sure of is that, any requests they made for a user, passed through that token. And you make sure that any data that’s accessed inside the system is always accessed on behalf of that user.
And then if you have internal systems that programs are writing they are just doing background jobs or whatever and then not doing something on the half of a specific user, I think that’s the way I separate those things. As far as what method I use to secure it, or if you are talking about providing services where other third parties can make requests on behalf of the user and that the chapter on security touches on that. And it also touches on HMAC based security, which I’m not so sure is a necessary step on the point.
JOSH: I have a different approach to the question and that’s in when you are talking in the book about “OK great, you have built this thing internal service. Now you can open it up to the public as an internal service.” Now, I have to not – about not reading the chapter on security, but in the earlier you mention you don’t talk about any of the security related concerns of “I have this internal service I’m just going to open it up to the public.”
PAUL: Right. Yeah, I guess I don’t talk about the– in the security chapter I presented like the two different methods for securing stuff is that the HMAC based method which is signing stuff you have like API keys. You know, kind of like Amazon where you get a key and a secret key, and you use that to sign requests. And then the other which is OF, but neither of those approach really talks about security of customer data which is something you have to worry about more on the model level than at the service level, right?
JOSH: Well there is that, but there is also what different clients are allowed to do? Like web based authorization that you know, your internal application oh sure they should be able to reinsert new user data but, the public using your API should only be able to view that data, not modify it.
PAUL: Right. But again, that is something that you are talking about at the model and not really at the API level. Like, you need to validate those things like there is this–I remember like last week or the week before, there’s this app on Hacker News it was like data on the AWS or something like from Dropbox to that and they have this security flaw where they have actually exposed people who signed up their AWS keys, like you could just go to like user/23 and see that user.
JOSH: Oh man.
PAUL: I’m pretty sure like almost every Rails developer at first time they wrote Rails apps, they didn’t secure against that.
CHUCK: They left the shell action wide open.
PAUL: Exactly and that has less to do with like designing an API and more to do with just like making sure that you have, in your model layer, security that makes sense. And I think, another thing like a proper RESTful API web secure for me it obscures the complexities of security, so that it should just do that and then if you make a request that is not violated should return an authorized response.
PAUL: But probably an answer that people would hope, they would probably want to hear like, “Oh, yeah, just use this library and the whole thing just—“
JOSH: Yeah, isn’t there a library to just do that?
PAUL: I’ve been using — but still you have to think about who can access what and you actually have to make the declarations and if you don’t, then you are wide open.
CHUCK: Right. Well, it looks like we are about our time. Are there any other topics that we ought to go over before we wrap this up?
AVDI: I have one quick one.
AVDI: Typhoeus. How do I pronounce it?
PAUL: That was actually correct.
AVDI: Awesome! I’ve been doing it right all this time. Actually, I brought that up just because I wanted to say Typhoeus is a terrific library. I have used it on a lot of stuff and its one of my preferred HTTP client back-ends and I also tend to use it as an example of good API designs. So, very nice.
PAUL: Thank you. Actually it’s funny because if I have the chance to do that now, [laughs] I would make it very different…
PAUL: …than it is. Yeah. I wouldn’t use a native library at all. I would just use threads and the connection pool, stuff like that.
PAUL: I’ll probably change up the API a little bit. I feel like there were some spots in that API, where I try to make it magical and the abstraction leaked a bit. And it didn’t provide me a way to like kind of power that I wanted. And part of that has to do with I think a little bit about shortcomings in Ruby as a language for designing in parallelism. It’s one of the reasons why I’m kind of enamoured Golang now with things built in to the language for parallelism, its seems like it’s fairly clear. And if I’m going to design a parallel library it’s easier to create an API with other programmers can understand and not shoot themselves in the foot with.
CHUCK: So, why did you call it “Typhoeus” in the first place?
PAUL: Originally I want to call it Hydra because I just thought “Oh, its parallel so it’s like a multi headed beast of legend” but the things I did a search on that name, it seems like it was too widely used, so I just thought, “OK, how can I come up with something a little bit more obscure?” [laughs]
JOSH: And you didn’t go with the obvious ——?
CHUCK: I was going to point out that it sounds like Avdi has Typhoeus fever. Anyway, let’s get in to the picks. Josh, what are your picks?
AVDI: I just thought that!
CHUCK: You just thought what?
AVDI: [laughs] It took me a few seconds.
AVDI: No. it just took me a few seconds to connect Typhoeus fever to Typhoid fever. I get it.
CHUCK: Folks, we are having one of those days.
JOSH: OK. I think we need to quickly go to the picks. [laughs]
CHUCK: Yeah, no kidding. I make a terrible joke and nobody laughs and then it’s like “Oh, now I get it.”
AVDI: No. that was awesome.
JOSH: OK. My first pick is really simple; its “httpstatus.es”. So, HTTP Status, it’s a really just nice little website you go there and it tells you what the codes mean. I use it frequently whenever I’m doing things like reading Paul’s book. And then, I have a little bit of self-serving stuff here; I have two picks videos form the latest GoGaRuCo and they are both topical to service oriented design. Both Jack Danger Canty did a talk on Mega Rails and David Copland gave a talk on Services, Scale, Backgrounding and “Whiskey Tango Foxtrot” is going on here.
CHUCK: Whiskey Tango Foxtrot [laughs]
JOSH: That’s the family safe pronunciation. So, they are different takes on the topic but both on them talk about things from a more holistic view point of we are dealing with big applications what the strategy is for handling this complexity and services where significant components of both their approaches. So, both good videos, both half hour long.
And since we didn’t put it in the announcement in the show, I’m also going to mention the Ruby Nuby Project. We’ve had a couple people do videos like half a dozen of videos or so. I want like hundreds of videos. I think everybody should do a video who’s new to Ruby and the videos have all been great and it’s I just wanna see a whole bunch more. It’s really great watching them. So, just go to rubyrogues.com there’s a link on the side bar for Ruby Nuby project. And definitely we are not going to let these videos get lost. We’ll collect them all somewhere once we have them all.
CHUCK: You mean like Pokémon? Yeah!
JOSH: Just like Pokémon. [laughs] You get a little USB drive with Ruby Nuby on it. But, moving right along. Who’s next?
CHUCK: Avdi, why don’t you go next.
My next pick is just the Go the language. [laughs] “golang.org” I’ve recently started writing a production server or something that will be deployed in production later in Go and I found it to just be a pleasure to work with. And my reason for moving to Go stem from kind of frustration of working with Scala. I Scala was the way forward, I’m beginning to become disenchanted with it so, Go is what I’m looking at now.
My next one is Kafka, which is a distributed messaging system open sourced by LinkdIn. I mentioned before that messaging is one of the things that I found as the core of any service oriented architecture and Kafka I think is really interesting because of the fact that it takes a very simple view on how messaging should work. And it’s almost like a distributed log file that you can just read from any point and you design around that.
And then my last one is also messaging based one. I saw Bitly last week opens first they’re like distributed real-time messaging system called NSQ and I read about that and I thought that is a pretty interesting approach and design around distributed messaging. I thought also that was cool because they did it in Go [laughs]. So, those are my picks.
PAUL: Basically, the reason why I’m looking at those instead of Ruby is really performance based. If I have my choice I like Ruby code because that’s any of these other languages, they are just going to like the bar that set is Ruby as far as my joy with like working with language and writing the code and none of them I think is actually going to meet that, but for performance reasons, I have to like move to another language and let go of that. Because like, actually real threading which I know you can get with JRuby or Rubenius that even still, like, it’s fairly trivial to create an app like an API in Scala that can handle thousands and thousands of requests per second on a single machine. Whereas generally with Ruby, when you are talking about handling that kind of load, you are talking about many, many servers. So, it’s really just performance thing.
With Go, I mean, my interest in it is it has the asynchrony stuff built in like Node does, I don’t know why, I just find Node like the callback spaghetti and the writing it on the server to be a little bit distasteful. I think it’s kind of hard to keep it organized maybe if we are talking about smaller things, it’s easier. I mean, I don’t have enough experience with Node to really say that the jury is out on that one for me, but yeah, it’s just, I don’t– I’m not sure that Node is the thing I would use to build like hard-core server side architecture. Whereas when I’m looking at Scala or I’m looking at Go, that’s what I’m looking for, something that I can create scalable distributed system and there are primitives built in the language that make that easier to do.
CHUCK: OK. Cool. Well, let’s wrap the show up. I just wanna remind everybody you can go sign up for the Ruby Rogues Parley at rubyrogues.com. You can also enter the Ruby Nuby project by going on to the same website. And I don’t think there are any major events or anything else coming up.
JOSH: We should announce our next book.
CHUCK: Oh, absolutely. You wanna take care of that first?
JOSH: Yeah, as everyone probably can tell by now, it’s the book “Practical Object-Oriented Design in Ruby” by Sandi Metz.
AVDI: I did not see that coming!
JOSH: Yeah, because you have been sleeping under a rock?
So we haven’t picked a time or the date for the Book Club episode but, it will probably be some time in November or early December.
CHUCK: Yeah and I have had a few people asking which Rogues are going to be at Ruby Conf and I believe that Josh is going.
JOSH: Yeah, I’m going to be doing a talk.
CHUCK: And I will also be there. I will not be speaking, which is kind of a nice thing at a conference.
JOSH: [laughs] Yeah my talk is in the last part of the last day, so I get to spend the whole conference worrying about it.
AVDI: What is your talk on?
JOSH: I’m doing an expanded version of the Thinking on Objects talk that I tried out at Steel City Ruby.
AVDI: Oh, excellent.
AVDI: I watched them, its good.
JOSH: oh, cool. I’m going to make it 50% longer.
CHUCK: [laughs] Your talk is Avdi approved.
CHUCK: [laughs] All right. We’ll wrap the show up then and we’ll catch you all next week and thank you for listening!
AVDI: Bye folks.
PAUL: Bye. Thanks.