031

031 iPhreaks Show – High Performance Core Data with Matthew Morey


Panel

Matthew Morey (twitter github blog)
Jaim Zuber (twitter Sharp Five Software)
Andrew Madsen (twitter github blog)
Ben Scheirman (twitter github blog NSSreencast)
Charles Max Wood (twitter github Teach Me To Code Rails Ramp Up)

Discussion
00:35 – Matthew Morey Introduction

Buoy Explorer
ChaiOne

01:23 – Making Core Data Perform
05:45 – Importing Data
08:23 – Batch Sizing
09:37 – Photo Blobs
13:25 – Persistence
16:43 – Query Performance

String Comparison
Order of Operations
Hashing
Tokens

22:24 – Concurrency Models

Context
Notifications

Picks

iPad Telepresence Robot (Ben)
Mercurial SCM (Andrew)
Florian Kugler: Backstage with Nested Managed Object Contexts (Andrew)
Needle Doctor (Jaim)
Grado Labs Black1 (Jaim)
Remote: Office Not Required by Jason Fried and David Heinemeier Hansson (Chuck)
Audible (Chuck)
High Performance Core Data (Matthew)
Planet Money Podcast (Matthew)
Core Data: Data Storage and Management for iOS, OS X, and iCloud by Marcus S. Zarra  (Matthew)

Next Week
Security with Rob Napier
Transcript
BEN: That’s the problem is that when my kids see the mixer, they are like, “Oh, knobs and buttons! I'm going to push all of them.”

CHUCK: Hey everybody and welcome to episode 31 of the iPhreaks Show. This week on our panel, we have Jaim Zuber.

JAIM: Boy, that is one cranky Rottweiler.

CHUCK: Andrew Madsen.

ANDREW: Hi from Salt Lake City.

CHUCK: Ben Scheirman.

BEN: Hi from Houston.

CHUCK: I'm Charles Max Wood from devchat.tv and we have a special guest this week, and that is Matthew Morey.

MATTHEW: Hello, also from Houston.

CHUCK: So since you haven’t been on the show before, do you wanna introduce yourself?

MATTHEW: Sure. So I got a couple of degrees in semiconductors physics and electrical engineering and quickly did nothing with those degrees. Spent a couple of years working on embedded electronics and a lot of C programming. And iOS SDK came out and jumped it to that, and been doing my own apps, including Buoy Explorer, which is a marine conditions app for surfers and water sports enthusiasts, where I implemented core data improperly there. And also I do work for a company here in Houston called ChaiOne, where we do a lot of client work.

CHUCK: Yeah, I've met those guys before.

MATTHEW: My boss is a real stickler.

CHUCK: Yeah, I've heard that a couple of times. We brought John today to talk about high performance core data. Are there tricks to making core data perform or does it just work, or what?

MATTHEW: Well, you can check the check box in the templates and it will generally just work. The problem is that it is such a complex framework and it’s just its so flexible and large. It’s very easy to put yourself in a bind or do the wrong thing and then suddenly, you'll have performance issues. I spent a lot of time making those mistakes, and I finally got to the point where I just wanted to figure all that out and kind of wrap my head around it. And so I've been focusing on that a lot, in particular.

JAIM: You mentioned in Buoy Explorer, you initially did it improperly. Do you wanna elaborate on what mistakes you made there?

MATTHEW: Yeah, so a common pattern in apps is you have to import data; either user’s data from the server or just general data, be it JSON, XML. On Buoy Explorer’s case, I'm downloading a bunch of data from these Buoys that are on the ocean and I measure wind conditions. And this data is very dense, so there's readings every 15 minutes from hundreds and thousands of these buoys. So there's a lot amount of data. And the way that the data is structured, I can't really fetch that data in a network efficient way. Unfortunately, I have to grab large amounts of data at a time. And importing that data into the persistence layer or into core data takes time; the data has to be parsed, the relationships have to be made, and then it has to be saved.

This episode is sponsored by

comments powered by Disqus

TRANSCRIPT

BEN: That’s the problem is that when my kids see the mixer, they are like, “Oh, knobs and buttons! I’m going to push all of them.” CHUCK: Hey everybody and welcome to episode 31 of the iPhreaks Show. This week on our panel, we have Jaim Zuber. JAIM: Boy, that is one cranky Rottweiler. CHUCK: Andrew Madsen. ANDREW: Hi from Salt Lake City. CHUCK: Ben Scheirman. BEN: Hi from Houston. CHUCK: I’m Charles Max Wood from devchat.tv and we have a special guest this week, and that is Matthew Morey. MATTHEW: Hello, also from Houston. CHUCK: So since you haven’t been on the show before, do you wanna introduce yourself? MATTHEW: Sure. So I got a couple of degrees in semiconductors physics and electrical engineering and quickly did nothing with those degrees. Spent a couple of years working on embedded electronics and a lot of C programming. And iOS SDK came out and jumped it to that, and been doing my own apps, including Buoy Explorer, which is a marine conditions app for surfers and water sports enthusiasts, where I implemented core data improperly there. And also I do work for a company here in Houston called ChaiOne, where we do a lot of client work. CHUCK: Yeah, I’ve met those guys before. MATTHEW: My boss is a real stickler. CHUCK: Yeah, I’ve heard that a couple of times. We brought John today to talk about high performance core data. Are there tricks to making core data perform or does it just work, or what? MATTHEW: Well, you can check the check box in the templates and it will generally just work. The problem is that it is such a complex framework and it’s just its so flexible and large. It’s very easy to put yourself in a bind or do the wrong thing and then suddenly, you’ll have performance issues. I spent a lot of time making those mistakes, and I finally got to the point where I just wanted to figure all that out and kind of wrap my head around it. And so I’ve been focusing on that a lot, in particular. JAIM: You mentioned in Buoy Explorer, you initially did it improperly. Do you wanna elaborate on what mistakes you made there? MATTHEW: Yeah, so a common pattern in apps is you have to import data; either user’s data from the server or just general data, be it JSON, XML. On Buoy Explorer’s case, I’m downloading a bunch of data from these Buoys that are on the ocean and I measure wind conditions. And this data is very dense, so there’s readings every 15 minutes from hundreds and thousands of these buoys. So there’s a lot amount of data. And the way that the data is structured, I can’t really fetch that data in a network efficient way. Unfortunately, I have to grab large amounts of data at a time. And importing that data into the persistence layer or into core data takes time; the data has to be parsed, the relationships have to be made, and then it has to be saved. And the first mistake — and most people make this mistakes — is that they’ll do these large types of operations, import operations on the main queue or in the main of the context. And if you are just using the built in xcode template, Apple is going going to give you a single manage object context. And that’s always going to be on the main queue — same queue that all your UI work is done. So at first issue, you always encounter is blocking the UI. CHUCK: Blocking the UI because you don’t have all the data in place to provide to the UI? MATTHEW: You will be doing operations on that data, and the issue is that while you are doing that work the UI is also trying to update. So you could be scrolling in a table list and while you are scrolling, you are also trying to import data or create objects. And so that is happening in the same queue. And you can’t do two things at the same time. CHUCK: Gotcha. So how do you get around that? How do you make it perform? MATTHEW: The easiest way to do is create a background thread and do that work on the background thread. And that’s a normal pattern and many performance issues. The problem is that, core data has very strict multi-threading policies. For example, contacts is tied to that thread, so you can’t pass objects easily between threads, and between contexts. You can do stuff like object ids, you can pass the id of the object or URI representations of the object. So it’s not as easy as just creating another thread and doing a bunch of work on that background thread. You have to be smart about it. With iOS 5, we got parent context. What that means is that you can set the persistence store coordinator for a context, to another context. So instead of persisting something all the way to the disc, you can just take your changes  up one level to the next parent. And so that allows you to do things like work on a background thread, while the main thread — the UI thread — is not blocked. But it probably makes sense to take a step backwards and what really made all of core data and performance related to core data makes sense to me is once I realized that it’s a balance or it’s an optimization problem, and really what you’re trying to balance between is the amount of object you store in the memory, and how fast you want. So a maverick system or a desktop system, you have much more memory at your disposal, so you can load your whole object graph into memory and still be okay. The system is not going to terminate or kill your app. And then sends all the other memory, they are going to be much faster. Obviously, in-memory objects are way faster than objects persisted on disk, but the result of using memory on iOS or mobile device where your memory constrain, it’s not feasible to load everything into memory. So you are forced to do this batching or fetching of objects. And so some objects will live in the memory, but some of their attributes won’t or will be faulted. Once I understood that core data is really a balance between memory and speed, and where you rely on that continuum is really the choice you have to make. JAIM: You mentioned importing data, like large amounts of data; how do you deal with making sure you don’t import the same data twice, sort of like that insert or update pattern or problem rather that you run into like if you import from a CSV file and maybe you don’t have a way to exclude things you already had. How do you handle patterns like that? MATTHEW: So ‘find’ or ‘create’ algorithm is pretty much used everywhere in software. And so the naïve way to do that would be to go through your new data, your JSON blurb and for each dictionary and that JSON extract a unique identifier or a grid of some sort and then query the core data stack or the core data layers to see if that object already exists. And if it does, then it’s an update. If it doesn’t, then you need to create a new object. You can enumerate across all the objects and do that. The problem with that technique is that you’re fetching one object at a time. And when you fetch one object at a time, you are going to possibly hitting the disc every time; you’ll quickly see slow performance immediately from doing something like that. So really, you should use a more efficient find or create algorithm. More efficient meaning that instead of hitting the disc every time, you should really batch those fetches. So if you have a thousand objects you are enumerating over, maybe you do 100 at a time. And so instead of doing a thousand fetches that hit the disk. You are only doing ten. And so you could in batches, go through each of those sets. And really, that’s going to be depending on the app; you have to fudge that number until you have find one that works best. BEN: So you think like maybe grabbing the IDs of those ten items, and just checking to see if any of those ten of exists and so which ones they were? Like if we are talking in terms of SQL statements, right? It will get translated to something like, “Select * from buoys where id=55” and then you check to see if you got a result back. And if you did, then you have to update it. Otherwise, you have to insert it. Would you do something slightly different just checking the existence of many IDs at a time for that batch? MATTHEW: Yeah, so in SQL terms, you would be performing an ‘in’ operation. So you could separate out your new data, your imported data into these batches, grab from the first batch all ten or hundred unique identifiers whatever they are, and then with those identifiers, you can create a predicate, which translates into a SQL statement using the in keyword. And then, you only fetch objects from the persistent store that are in that batch. JAIM: That sounds good. So when you are trying to figure out your batch size, you are talking about doing things just trial and error? Just like the size of the object you are trying to return, does that have an effect? Or how else can you kind of determine the right size to do. MATTHEW: I think it’s really important to measure. When you are talking about core data, measure, measure, measure. Because a lot of stuff is not very clear and you got to use the tools to do these things. So before I make any changes, the first thing I will do is measure it; make the change and then go back and verify that change happened. So instruments is the best way to do this. One of the templates is the core data template, and I like to add the time profiler in there as well. And with that, you can measure how long your fetch is or saves or your faults are taking. The batch size really is a function of how much memory you wanna use, and how fast you want it to be. And there’s no right answer to that. I found in general, if you divide by ten is a good batch size. But really just depends on how big your entities are.  If you re storing large photo blobs, then that batch size needs to be much smaller. If it’s just a bunch of strings, and it is really small, then that batch size can increase. So it’s really how much do those objects take up in memory. BEN: You mentioned the photo blob thing. Being very sort of naïve on core data, I saw that and immediately wanted to avoid it because it seems like, if you are fetching these objects for display in a table and then say you tap on one and then you see the detail for that item. And one of the properties on that entity is like an NS data representing photo data. Is there anything built in that will not load in every image for the entity, or would you have to specifically say, “Okay, don’t load that.” MATTHEW: I guess the way you would do it, if you were just using pure SQL would be to maybe store a large blob like a photo externally on the file system, and maintain a path or a URL to that file. With core data, you can specify how to treat a large blob. You can say, “Use external storage,” and core data will decide if it wants to store those bits actually in the SQL database, or if it just wants to store a URL or a path to that photo or that binary blob. BEN: The URL definitely is more in line with what I’ve implemented in the past. I feel like core data manage it, I’m worried about like select performance of a single entity. Would you structure this in a way where you tell it which properties to fetch or would you just move the photo storage off to its own entity and reference it? MATTHEW: It depends on the app. If you have a table-based app where you wanna show a thumbnail of an image in every single row, the best option in that situation is to take that large photo blob and put it off as its own entity. Because really, you only need that full sized photo — that full quality photo — when you are in a detail view or when you are zooming in to that picture. When you  are in the table list, you just need the thumbnails; and so in that situation, I would make a low quality version of that thumbnail. It is much smaller and I will store that in the database, but there would still be a relationship to a separate entity that is the full quality photo. And I would let core data decide how to manage that second entity, the photo entity if it wants to put that large blob in the database, that’s fine. If it wants to persist it out the disc and just maintain a URL, that fine as well. BEN: So you can say typically select only one of those, right? MATTHEW: Yeah, it will be one at a time. BEN: Those are really two orthogonal questions, whether you store the image in external file and whether you put it in its own entity at the other end of the relationship or kind of separate questions that have different implications, right? MATTHEW: Yeah, I guess could you elaborate? What do you mean is orthogonal about it? What’s the issue? BEN: Well, not storing a file in the database means that it partly keeps your database size smaller. Whether that’s important or not is another question. But it seems like even if you just store the  big data at the other end of a relationship, that in of itself makes that data will not be faulted in unless you actually access that relationship, as opposed to being faulted. And as soon as you access any attribute on the parent entity. MATTHEW: Yes and no. Depends on how you are doing the batches on the entities. So you don’t have to fetch all the objects. You can tell all what properties or what attribute of the object you wanna fetch, and everything else will be a fault. You can also just grab object ids from it. So core data gives you the options to determine how much of that entity is going to actually be fetched. BEN: Okay. And we are still sort of talking about the scenario where you’re trying to do create and find, right? MATTHEW: I mean, this isn’t in particular just to create and find. If you already have the data in the app and you are shipping it with the app, you still wouldn’t want full sized photos in that primary entity; especially if you are on a table list and you’ll scroll to thousands for these, you’d still want that off in a  separate relationship. BEN: I think that makes sense. CHUCK: One thing that I’m wondering about is that core data, from what I understand is more an API that wraps whatever query language or whatever it uses to actually talk to the database. So, do you have  some of the fine grained control that we are used to with a lot of the other relational databases, where you can set indexes and things like that, so that when you do a query it is real fast against a large set of data. MATTHEW: You can set indexes on core data store, but now you can’t do pretty much that’s all that you can tell it which ones are indexes, but other than that, you can’t do much. And that’s primary because it’s such a… SQLite isn’t the only persistent story; it can be xml, in-memory, it cannot be persistent as well. It can just be in-memory. And then recently, when we’ve gotten this incremental store and so you can do crazy things like use p-list files and the file manager to be your persistent store. BEN: Yeah, I think a lot of times people talk about core data as if it is Apple’s API for SQL, but that’s not true; it’s Apple’s data persistence API. And the fact that it most often uses SQL is really kind of… JAIM: I feel like that’s one of the problems. At some levels like things that abstract SQL from you can be be the leakiest abstractions because prevalent with core data — as with any technology that talks to a database and walks relationships – so you had like a customer that had an address that was like a one to one relationship, and you were displaying the list of customers and the address in the same list. And if you have 100 customers, then you’d have one query to pull in the list of 100 customers. And then one query per row as you walk that relationship to execute a statement against the database. Right? So you have that standard select in plus one problem. And there’s ways to get around it in core data, by saying the set additional relationships to fetch, so you can tell it, “Okay, I want to also pull in the address for all these customers.” And then in the database that will do either two queries or a join whatever it thinks. But still, you have to be aware that that’s an issue. MATTHEW: Well, I don’t think you should think of core data as persistence layer; that’s one of its features. I think you should think of it as a model layer, as an object graph, so to speak. It’s really good at relationships and it’s really good at living in Cocoa world. One of its features is persistence. And you are right, you have to pretend that it’s not SQL because then you start trying to do SQLite optimizations that don’t really make sense. JAIM: How many talks on [unintelligible] just start up by saying [unintelligible] is not ORM. But I’m going to explain stuff in like pure ORM language. CHUCK: [Chuckles] MATTHEW: The fact that it is the most common persistence store core data is SQL, I think does give it an advantage though because you can pass  certain launch arguments, like SQL debug, which will give you the actual SQL [unintelligible] in the output in the debugger, and you can see what the SQL statements are. And then, you can take that SQL statement and you can actually go find the SQLite file on disc and use any SQL tool you want or command line and actually look into that SQL file, and see what’s going on and maybe see why stuff is taking longer than you think it should. BEN: What about just general query performance like because we know that SQLite is under the hood the way you structure queries can have an impact on performance, especially with very large datasets? Is there any tips we can do to speed up specific queries? MATTHEW: Yeah, definitely. Computers are  good with numbers, so anytime you are doing any kind of queries and you structured that query with a predicate, you should definitely do a numerical comparison first; the equals, the greaters, less thans — those type of comparisons. Any like comparisons should happen first. String comparisons in particular are very expensive and should be avoided at all. If you can at all, you should never do string comparisons if you can help it. If you have to do string comparison, there are some options; you can normalize strings or canonicalize the strings, which basically removes the diacritics and case sensitivity. And you can store off a normalized version of your string — like a comments field or some description field — you can store it off as a separate attribute on that manage object that’s been normalized. And then when you do a searching or a query on that field, you would search the normalized version instead of the actual text. CHUCK: So I don’t completely understand what you said there. I mean, I understand that you are saying basically, to say remove capitalization and things like that, but the other parts didn’t quite follow. MATTHEW: So the order of operations definitely matter. A numerical comparison is like greater than or less than first. So for example, if you had a context app or an address book app, and you wanted all your friends that are older than the age of 30  and named ‘Bob’, you would want to do the ‘older than 30’ comparison first, and then the string comparison of  ‘people that are named Bob ‘second. And that’s because the numerical comparison is very efficient on our computer. And the string comparison may have to fire a regular expression engine — and it takes time. Much more time. BEN: And so the idea there is that you are filtering out the people over 30 first, and so you are doing string comparison on the last rows. MATTHEW: Yeah, correct. So  the heavy loading, the heavy comparison of the strings is happening on smaller data sets, so it will take less time. JAIM: Do you ever use like hashing of a string? If you wanna do a lookup like that? Does it ever make sense? MATTHEW: If we are talking about comparing the strings  and increasing that performance is the first thing to try is to normalize the strings. And you can do that with a custom setter on the entity. So any time you set the comments or description string, you could also set a normalized version of that string on the entity. And if that’s not good enough, the next step would be to separate that string or that sentence of strings into tokens, or separated by white space. And then you can use the ‘begins with’ or ‘ends with’ technique of searching the string, so you don’t have to do the matches for a regular expression match. And then you can search tokens, which is much quicker and you still have to use a relationship. So there is a hit there, but that is much faster. And then again, if that’s still too slow, I think your last option you should pursue is an in-memory hash. You will still need to tokenize the strings or the comments or the description, but then you can only store the first three letters of each of those tokens in a hash in memory. And even if you have millions of strings in a database, a hash of only the first three letters of each token is going to be pretty quick. JAIM: Talking about the tokens, are those like a dictionary? How is it stored? MATTHEW: In this example, a token would just be a word. So if you have some string, you can separate it on white space and normalize those strings or remove the capitalization and diacritics. So each word would be its own object or token, and it would have a relationship back to the original entity where it came from. And the idea here is that if you put these token into a separate entity that are normalized and cleaned up, it’s going to be more performant to search that instead of  pulling every single object into the memory and searching all the content in a non-normalized way. JAIM: Okay, so we’ve got a separate entity that has kind of information on let’s say a sentence, it’s five words. Now is that going to be 5 separate tokens or you just normalize it and put it in one string? MATTHEW: You could do it both ways. If you are going to go this far down the rabbit hole, you might as well separate it by white space and you would have 5 entities for the 5 words in a string. JAIM: Okay, so each entity can be a way to look at what’s happening? MATTHEW: Right. And what this allows you to do is use ‘begins with’ and ‘ends with’ predicates which are much quicker than  a ‘contains’ or a ‘matches’. So it ‘begins with’ is just kind of do a quick comparison between the first letter of the word you are comparing against; then second letter, then and the third letter, so that’s much quicker than trying to do a ‘matches’ or ‘contains’ or a regular expression type of match. JAIM: Okay, very cool. MATTHEW: The problem is when a user is searching, you don’t know if they are searching for a word that’s in the middle of the sentence, at the end of the sentence, the beginning of a sentence. So you can’t just do a ‘begins with’ if you don’t separate it into individual words or tokens. CHUCK: Are there systems or good ways of doing that on iOS? MATTHEW: We have a couple of C functions that enable this behavior, but there’s no standard support. So you kind of have to roll your own. The C functions are called Cf string normalize and cf string fold. And so a cf string fold you can pass it options such as case insensitive, diacritic insensitive, with insensitive. And then, you can also pass it one of the normal relation forms which the most popular one I think is form d. I’m not quite sure what the different  forms mean, but I know form d is kind of the most popular one. JAIM: What other wrong ways are there to do core data? MATTHEW: Setting up your concurrency model or the way that you interact with your core data is a common place where you can kind of go wrong. Apple gives you the simple single manage object context. And that usually works for most situations, but if you need to start doing large import operations or other, where your concurrency model is probably going to need to change and you are probably need to use multiple manage object context. And so you have multiple ways of doing that; you can use the parent-child technique where one child has another context as it’s persistence store coordinator or its parent, or you can do the more classical example, would be where you have two separate context that kind of equals and they share a persistence for a coordinator — and they are  not nested. Have you used either of those? ANDREW: I ran into this problem last week. We we’re using a child context, not three levels deep, but just a main context on the main queue and then a private queue or a private context that we created when we need it to do a big import, and then saved the data from private context back up to the main context, and then to disk. And that turned out to be really slow, so I switched to using two separate contexts that share a persistence store coordinator and then merging with merge changes from  manage object context did save notification —  and it was much, much faster. So it turned out to be a huge difference in terms of performance of the save, and it really was not that much harder. So I thought that the API private child context was easier and that’s why we went with that in the first place. But it actually turned out that doing it sort of the old fashion way with two separate contacts was not really not any more difficult — was not even any extra lines of code in the end. MATTHEW: So in your first attempt, the first way you did it, you had the main manage context objects and that saved directly to the persistence store coordinator. And then when you need it, you would create a child context or a private queue off of that one? ANDREW: Right, that’s true. MATTHEW: So in that situation, you are still blocking… when you do that save on the main queue, that save to disk is still going to be blocking. And one way you can relieve that save or make an asynchronous save in that situation is by adding a third context. And so you would have you child import queue or your worker queue, and its parent would be your main manage object context or your main queue, where the UI work is. Yet another manage object context that you could call the writer manage object context or private background writer manage object context. And that’s the one that will actually save to disk. And in that situation, you are persisting the disk will actually be asynchronous, because you are using a private queue to do that or but you would still have a block anytime you are trying to read from disk while that writing is taking place. And that’s because the persistence store coordinator serializes all requests. JAIM: So what’s involved in keeping these changes between the different  context in sync? What kind of code do you have to write to do that? MATTHEW: Talking about this child-parent context technique, really, you don’t have to do anything. You just have to make sure you perform your operations on the same queue, as that context and you can do that by using the perform block API. And core data is going to handle all of that. The persistence store coordinator is where your object ids live and that’s really the way a core data kind of tracks all these objects in the same persistence store coordinator. And these contexts are children of a parent contexts. It just all works. There is no issue transferring the data back and forth. If you are going to the route of using two separate context that are equal, that are not a child or parent-relationship but they do share a persistent store coordinator, in that situation, it’s best to use notifications to pass changes between them. And you can still use object ids between the two contexts and you can use your URI representations between the two contexts. You can’t pass the actual object between them, but it’s easy to grab the object once you have the ID. And in that case, one of the context does saves, it post a notification. And if you want the other contexts to do something with those changes, core data provides you an easy method to do that. Merge changes from notifications is the actual method. JAIM: Do you roll your own notifications or are these provided for you? MATTHEW: They are provided for you. There’s a ‘will save’ and a ‘did save’ notification. The ‘did save’ notification is usually the most viable one. If you are rolling your own syncing service or an incremental store, or you are trying to do something with iCloud, the will save notification becomes important. And that’s because during the ‘will save’ notification, you are notified of what objects have changed, what objects are inserted. During the ‘did save’ notification, some of the data might have failed validation or might have been modified, it’s hard to keep track of that. So in certain situations, you wanna use ‘will save’, but most of the situations, you wanna use ‘did save.’ JAIM: Okay, so you mean contexts, you will just put a hook in there for a ‘will save’ and do your validations and make sure everything is correct? MATTHEW: Exactly. The reason we have so many concurrency models, relate those back to the fact that the persistence store recorder is a serial queue, and you can’t read and write at the same time from it. And so we try to get creative and create all these different type of  models that will alleviate certain pain points. And so when we are using child and parent context, it kind of alleviates that asynchronous save to the persistent store, but then you have the issue of all your data has to push between each context. When we use the sibling type of concurrency model, where they are both equals, the tradeoff there is that, you don’t have that easy way of notifying the other context; you do have to manually merge those changes in. And with the recent release of mavericks in iOS 7, there’s a way to get around that persistence store coordinator locking. BEN: Can you tell us about that? MATTHEW: So if you’re using SQLite, the way it actually stores data to disk has traditionally been journaling mode. And with the recent version of mavericks in iOS 7, they enabled the right ahead logging, and so it’s like a transactional type of way to persist these data, so you each change is like a transaction that can be performed. And so with the right ahead logging, you get concurrent reads, so you can have multiple reads from your SQLite file at the same time, and one single concurrent right at the same time. Persistence store coordinator is still going to serialize all requests that come through it. Reads or writes, they are always going to be serial. But you can have as many persistence store coordinators as you want and they can all talk to the same persistence store file. And because the persistence store file now, if you are using SQLite and its using right ahead logging,  which is by default with iOS 7 and mavericks, then you get concurrent reads and one write. There’s a lot of gotchas though with that, and I don’t recommend it unless you really, really, really, need that last bit of performance. And the reason is that if you have two separate persistence store coordinators, you pretty much have to almost treat it as if it is two completely coordinated stacks. And you can’t pass object ids because that’s related to the persistence store coordinator. You definitely can’t pass objects; relationships don’t really work, but what you can do is use URI representations, so that will work between the two. JAIM: What does the URI look like? MATTHEW: Good questions [unintelligible] persistent object across the system. I’m not sure. I’ve actually never looked at the actual string that it generate. I’m assuming it’s going to be the hex object ID, which is just going to be a long hexadecimal number, followed by some kind of deliminator like a colon or something, and each of the attributes’ values. CHUCK: All right. Well, it’s been a pretty good discussion. I know that some of us have some time constraints today, so we are going to have to start wrapping it up, but thanks for coming and talking to us about core data. MATTHEW: Yeah, it was awesome.  I enjoy the show, and I always like talking about core data — which might sound weird. CHUCK: [Chuckles] Yeah, and sorry for the technical issues on my end. But let’s go ahead and get in to the picks. Ben, do you wanna start us off with picks? BEN: Sure. I just have one pick today and that is that ChaiOne, we have some remote workers and we always try to do things in our culture in our company to help enable remote workers to be more effective, and be more a part of the office. And one of those things we’ve done recently is we secured an iPad Telepresence Robot. You can find these at http://www.doublerobotics.com/. They are pretty expensive, but it’s a little robot on the stand and literally somebody else in the browser across the world uses the iPad camera and the internet connectivity through a web browser, and they can use the arrow keys, kind of like you are playing, W, A, S, D. You can move the robot around the office and you can raise and lower the stand to make yourself taller or shorter — and it’s pretty creepy, but kind of a cool edition to workplace that supports remote workers. That’s my pick. CHUCK: Awesome. Andrew, what re your picks? ANDREW: I have two picks today; the first one is Mercurial. And Mercurial is sort of like Git, it’s a distributed source control system. And I’m not going to talk about the differences between Mercurial and Git, but we used Mercurial. I really like the UI and the command line UI and I’ve switched to using it to my personal projects. And the second one is relevant to our talk today, but it actually really helped me this week, so it’s a blog post by Florian Kugler called ‘Backstage with Nested Managed Object Contexts’. He actually disassembled the core data framework into assembly to work out some of the reason why the child context stack method is less efficient for certain things that having two separate context with the same persistence store coordinator. It was just kind of a low level dive on that, which I though was really interesting and also useful. Those are my picks. CHUCK: All right. Jaim, what are your picks? JAIM: So I’m going to make an audio pick today. So a good friend of mine bought like an old 70s audio file record player. So I thought to myself, “I’m going to get my record player back working again.” So it turns out the thing I bought in the pawn shop in like the 90s, it’s actually reasonable, so I called up a place, the Needle Doctor which is the old hifi shop in town. They are online at  needledoctor.com but actually they helped me pick out a really nice new cartridge, which sounds pretty amazing. It’s very forgiving for old scratchy vinyl for someone like me who just bought records when they are kind of cheap and kind of scratched up, but yeah, they really sound great. The cartridge is by Grado Black. It’s like $60, so for kind of an upgrade for your system, it’s really a fairly good value for what I paid for it. So what I recommend that. BEN: Good choice. I have a Grado Gold on my turn table. I’m a big fan of Grado cartridges, so. JAIM: Yeah, my record sounded better than I thought. I thought it they’d be all scratched up. Almost sounds as good as a CD – but I’m not going there. CHUCK: [Laughs] I know there are some diehard vinyl people out there that nothing sounds as good as vinyl. Anyway… JAIM: People hate me right now. [Laughter] CHUCK: I don’t know any vinyl’s, so there you go. Anyway, so my picks actually go along with what BEN was talking about. I’ve been listening to a book on Audible, it’s called Remote. It’s by David Heinemeier Hansson and Jason Fried. And they are the guys that wrote rework and they just talk about the trends and the benefits and the tradeoffs to remote working. And they give a lot of pointers on how to do it and I’m really enjoying it. So I’m going to pick that book. And I´ll pick Audible as well. Just an awesome service and I really enjoy listening to audiobooks. In fact, I was stealing my wife’s credits on Her Audible account, so she made me get my own. But anyway, Matthew, what are your picks? MATTHEW: So I have three picks: the first one would be a self-pick http://highperformancecoredata.com/. I’m trying to get a resource of performance problems related to core data and solutions. And so, if you have issue, definitely get in touch so I can try and get something out there. Second pick is Planet Money Podcast. A lot of times we kind of get wrapped up in technical podcasts. And although i won’t miss an iPhreaks episode, I also won’t miss a Planet Money episode. The third pick is the Core Data Second Edition by Marcus Zarra. That’s one of the pragmatic programmers books. And by far, that is one of the best Objective-C books; the way it’s written and the way it pictures through implementing a core data, and all the pitfalls that come with it. It’s a really great book. If you do anything with core data, you should get the book by Marcus Zarra. CHUCK: Awesome. All right, well we’ll go ahead and wrap up the show. Let everybody get to the stuff that they have going on today. We’ll catch you all next week!

x