278 RR Consequences of an Insightful Algorithm with Carina C Zona
- Published on:
- September 21, 2016
0:45 – Introducing Carina C. Zona
2:10 – Coding consequences
6:00 – Examples of consequences
10:50 – Data quality theories
14:05 – Preventable Mistakes and Algorithmic Transparency
17:30 – Predictive Policing and Biased Data
22:07 – Coder Responsibility
- Mechanical Turk
- Google Crowdsource App
- “Social Network Nextdoor Moves To Block Racial Profiling Online”
- “raceAhead: How Nextdoor Reduced Racist Postings Using Empathy”
31:35 – Algorithm triggers
37:20 – Fixing a mistake
40:15 – Trusting humans versus trusting machines
- Facebook Trending Topics
44:30 – Considering social consequences
47:30 – Confronting the uncomfortable
50:30 – Fitbit Example
- “How Data From Wearable Tech Can Be Used Against You In A Court Of Law”
- “This chicken breast has a surprisingly healthy heart rate, considering it’s dead”
- OSFeels 2016 Talk by Emily Gorcenski with chicken example
99 Bottles by Sandi Metz (David)
Vivaldi Browser (Saron)
Magnetic Sticky Notes (Saron)
Ruby Remote Conf Recordings (Charles)
Rails Remote Conf (Charles)
Books by Howard Zinn (Corina)
Charles: Hey everybody and welcome to Episode 278 of the Ruby Rogues Podcast. This week on our panel we have David Brady.
David: Today I learned on the ship that Poop Deck gets its name from the French word poop which literally means the back to the boat, effectively ruling out my claims to a rich naval heritage.
Charles: Saron Yitbarek.
Saron: Hey, everybody.
Charles: Sam Livingston-Gray.
Sam: This sentence, no verb.
Charles: I’m Charles Max Wood from Devchat.tv. We also have a special guest this week, that’s Carina C. Zona.
Carina: Good morning.
Charles: Do you want to give us a brief introduction?
Carina: Sure. I am a developer at Dev Evangelist, a noisy advocate on Twitter, the founder of CallBack Women which is a project to radically expand a gender diversity at podium of professional programmers’ conferences. I am one of the administrators of We So Crafty, a Slack group for techies who craft.
Carina: I’m a certified sex educator. I loved talking about any and all of these topics.
Sam: There’s a certifying board for that?
Carina: Well, there actually is I think a professional certifying board but my certificates are from other organizations including the Unitarian Universalist Church which has a lifespan, comprehensive sex education program that starts in kindergarten and goes through senior citizens. There’s various levels and I’m certified for several of those levels.
Sam: Very cool.
Charles: Interesting. I think that’s a first on this show, I have to say.
Carina: I like to hope so. Although there are other sex educators in tech, there’s a number. We’re here.
Charles: Nice. We brought you on to talk about consequences of an insightful algorithm. I kind of blew through the talk rather quickly. Do you want to give us the sort of starting point or the main point that you want us to talk about? Then we can go from there?
Carina: Sure. The talk is dealing with essentially unintended consequences and specifically harms from really just every day coding, the way that we make decisions can have side effects that are pretty personally devastating. We’re usually completely oblivious to that. It’s really trying to surface some of the different ways that harms have been inadvertently perpetrated by others and what we can learn having some principles for how to prevent those ourselves.
In part, it’s scaring the pants off of people. A number of people have given it reviews of things like intense and dark. There’s a number of content warnings because they deal with a whole lot of sensitive topics, which is the point that you think you’re doing something really innocuous. Somehow, it can end involving really touchy subjects. That’s the big picture of it. It also deals as a consequence.
Stuff like algorithmic bias and algorithmic transparency and particular field of machine learning that’s really emerged in the last couple of years which is called deep learning. We can talk more about any of those as well.
Charles: Yeah, I have to say some of the examples in the talk, I was just like, “Wow.” At the same time, what brings us at the place where it’s, how could you make it not do that? Anyway, I don’t know if I have good answers. Some of the things you brought up definitely would help. Even then, you mentioned machine learning. I know that a lot of these algorithm just feed that a ton of information, teach you how to sort through it, eventually it learns. It’s okay, am I just not feeding it the right sample set?
Carina: There are various causes, this is certainly one of them. For machine learning, you have to be really careful that the training data set is very close to the production data set. It’s really easy for those to differ, particularly we’re talking about the scale. You’re using something smaller scale to do the training. You’ve got big data flowing through continuously to the training maybe once, or periodically. You can easily end up having greater diversity of data coming through that production stream than what happened to be hit by the training.
We’re humans, we’re constantly changing, we’re constantly evolving. Even on the macro scale from day to day, or minute to minute. The data that was totally on point and parallel a couple of weeks ago can start migrating away from that. We have a lot of opportunity for something that seemed like really apt training to still be off base.
David: Carina, we’ve got some listeners who are going to go from this episode directly to watching your talk. For those people, they don’t know what examples you’re talking about. Can you give them an example? You’re talking about machine learning, I have to confess, I haven’t seen you talk yet, I’m going to watch it next. I’m just doing that right now. I’m here as the straw man. I’m here as the straw man for your argument but you said machine language, so I can already start to see some of the horrors that are coming out of this. Can you give us some examples, at least an example of an example?
Carina: Yeah, thanks for asking. Not that particular situation but some other ones that are related. This one is particularly fascinating to me. There were a couple with image recognition. One was Flickr’s implementation of deep learning which is using artificial neural net to do un-label trainings. You don’t have to label photographs, this is of this, this of that, it learns on its own. How to group and categorize things, and name them just from having seen other objects already having been named. It’s looking for those patterns.
Flickr had implemented the technology. It ended up labelling an absolutely gorgeous photograph of a really well composed as an ape. In the context of US History particularly, associating an African-American person with apes, monkeys, gorillas, etc., has a long history of being a deeply offensive racial slur. For those who are listening from outside the US that are not familiar with this history, you could pick a worse there.but this is a pretty bad one. On the face of it, it can usually appear to be intentionally racist in some way. I have theories that it is not, I can explain why.
What happened a month after that one was well publicized is Google photos which had newly implemented deep learning to do labelling the photos had a nearly identical incident in which a photograph, in this case, two black people was labelled as gorillas. You got the same problem. Here is one of the problems we have to pay attention. We have to pay attention to what other people are doing, we can’t be so isolated that we are unaware of oh, that happened somewhere else. We should really check, let’s set up a test to make sure ours doesn’t do that.
It’s really easy to not be paying attention. The consequence of that is not just for the person but also we end up looking, we for instance to that, as Google photos for instance, you end up looking even worse. You’re looking more foolish, you look like you don’t care, you look like potentially it was deliberate to make a decision not to care. Whether any of that’s true or not, I don’t know. But it ends up looking far worse the second time than the first time.
This is the reason to be really making sure that we are paying attention to the industry, particularly paying attention to just screw ups. Saying immediately, “Let’s check to see whether we would make that same screw up.”
Charles: This reminds me of the thing that Hewlett Packard did back in 2009 where they released a webcam that would digitally track. It would take huge picture, super wide angle, at super high resolution. It would figure out where your face was. It would crop the image and zoom in on that square, basically it was face tracking. The engineers that developed it were all white dudes. They literally did things like the forehead should be lighter than the eyes so we’re going to look for some dark spots in a light field area.
The end result was an entire line of actual hardware, introduction, in stores at Walmart that you could go, well Walmart doesn’t sell computers, you walk into a Best Buy and you could walk up to a computer. If you happened to have darker of skins, the laptop would deny your existence as a human being. It will literally say, “No one is present.” That’s brutal.
Carina: Yeah, that is. If you think about where we’ve gone in later years, we have stuff like phones doing security lock, unlock, based on things like recognizing your face. Now it’s not just that was offensive or annoying, it becomes you’re actually preventing me from using my own systems. I think with a couple of years, we’re going to start seeing that with desktop computers, laptops as well as this becomes an option to do, using facial recognition to log in. We have to deal with these problems.
You brought up essentially deliberate bias. Not deliberately bias, but conscious choices that left out important data. Important data being not everybody has light skin. In case of some of the examples that I brought.
Charles: I should clarify, I am conjecturing as an engineer, that’s what they did. I don’t want to slam these guys for something they may have deliberately done so much as—it’s very clear that they never tested against somebody with just Nigerian purple black skin as part of their test suite. It’s obvious that it never occurred to them to do this. Sorry, they screwed up but I don’t want to accuse them of more screw up than they actually did.
Carina: Fair enough. It is a legitimate theory that we don’t know whether it can be applied to this case. Another legitimate theory is related to data quality. Back in the 50’s when Kodak was first coming up with film emotions for color, they made a decision, we’re talking about an ear in which the US was still highly segregated, that they were all interested in the market for white skinned people.
Everything with the calibration of film development was based on photographs of white women and calibrating to get the most detail out of white skin. Dark skin, let alone black skin, simply was not part of the consideration there. You had film emulsion, literally there’s an algorithm there of we repeat these steps in order to get this consistent outcome. The algorithm is only do that for white people. It wasn’t until the 70’s and 80’s that they actually started getting pressure from ironically manufactures of chocolate and dark woods saying, “Your film is harming for our product marketing.”
Carina: Actually at that point, put out something else.
David: We’re not going to change for people, but for chocolate, okay.
Carina: Right, right. That’s money. The entire market of human beings is not important, but yeah, commercial. Right at the same time, we were also developing digital imaging and they likewise started with a white model as that reference point. You have what at this point? 60 plus years of imaging being built around the idea of optimizing for detail in light skin, not caring at all about detail in darker skin. The darker you get, essentially, the lower quality data is being gathered. If you’re not aware of that, you’re treating the entire data set as being essentially equal quality.
We got this legacy of all this data that’s been carried over, we’re still doing imaging the same way. You can’t start off with current cameras and say, “Hey, let’s make photographs completely render differently.” Here we have this legacy, we’re still dealing with it, we don’t know we’re dealing with it, inevitably that means that photographs really differ in quality in ways that are not discernible to human eye, but very discernible to machine learning. They are more prone to make mistakes because the data simply isn’t there.
Saron: When we talk about things like image recognition and things that involves skin color or just images, I think that feels very preventable. It feels like, “Let’s consider all of the colors and not just some of the colors.” For the examples that you’ve researched and all the works that you’ve done in algorithms, how much of the mistakes that we’ve seen are preventable in that way? How much of it just, “Let’s hope we did this right but as soon as we make a mistake, let’s see how we can fix it.”
Carina: I think we always have to have in our toolbox that tool of, “We made a mistake, we need to go back.” In between that is, “We apologize so much, we will make sure this doesn’t happen again.” We have free willing to publicly admit it and recognize the impact.
A lot of these problems are hard. There’s a reason why these mistakes happen. Some of it is on us, I just said, to take note. Some of it for instance have Google dealt immediately with the problem was nothing could be labelled gorilla, which has problems at the other end with things that actually are gorilla. On the other hand, we assume a less harmful human impact than labelling people as gorillas. Someone is not getting their photographs tagged but that’s probably more okay.
We can do things like make some choices for certain things or words. How can we handle them in ways that can minimize harm? It doesn’t necessarily mean omit everything that we think could be offensive. It means think about anticipate alone. Just anticipate some stuff will end up being problematic. Let’s envision some of the things that could be problematic. Let’s have some sort of policies in place for how to deal with that. Certainly, we’re testing. You definitely want to have tests. You want to make sure that you’re not going to have a regression somewhere down the line for something you’ve consciously tried to deal with. There are some things we can do.
This is where algorithmic transparency also comes in, which is a principle of literally much the same way you have open source software, making the algorithms public. The reason you do this is essentially like open source, it’s that ideal of lots of eyes surfaces more problems. Being able to have a much more diverse populous, looking at these and saying, “I see something. I see something that you might not have anticipated, that I know very well.”
Most black people, for instance, Indian people from the Indian subcontinent, are very aware that it’s hard to get a good picture. That’s not a surprising problem for them. If you ask the right person, “Hey, what could go wrong here?” They can immediately tell you, “I know one.”
David: I feel like there’s another layer here. You talked about the importance of algorithmic transparency. I agree that’s very crucial, but there’s another category of this kind of problem. I think a good example of this is the movement towards predictive policing, which is the idea that you can feed a bunch of crimes statistics into machine learning model and then have it tell you where crime is likely to occur. It may be that the machine learning and algorithm there is just trying to pick up on the patterns that humans might not immediately spot. Except, the problem with that of course is that if you’re feeding at data on a rest rate, that data itself comes from decades and decades of police bias in who they choose to pursue and arrest do recommend those policing against.
I feel like we also have to be able to have these conversations not only about how our algorithms are biased but we need to be able to talk about and recognize when our data itself that we’re feeding in is bad.
Carina: Yeah. In the case of that one, you really have to reconsider even using, say arresting conviction records as a proxy for where crime happens or who perpetrates crime. Those are actually separate things. We have to be reminded, that’s not the data, that’s a proxy for the data.
Charles: Yeah. There’s part of me that wants to say, “Don’t throw up predictive policing based on this. Maybe we should, until we understand this.” There’s a part of me that wants to say, “What you feed in is what you’re going to get out.” Instead of saying, “We’re going to predict where crime happens.” No, “We’re going to predict where you’re likely to make an arrest.” Based on your track record, you’re likely to shoot a black kid in this neighborhood. That’s the corpus that you’re working off. You’re working off the enforcement data. We’re not predicting where you’re going to find crime. We’re predicting what you’re going to enforce. I think that there’s some educational value in that, I thought.
Carina: That’s interesting.
Sam: I feel like to bring it back though to our responsibilities as computer professionals of some sort or another, we need to recognize this tendency that we have to take bias data and throw it into an algorithm and assert that somehow magically it isn’t biased anymore. I think we need to be able to recognize that that’s a problem and figure out how we might deal with it.
Saron: Yeah, I agree.
Carina: In the example you give, what the problems is it’s being used for the purpose of sentence. You brought up that predictive policing is being what’s extracted by this particular algorithm. Can we anticipate where crime’s going to happen and be there essentially on the spot to prevent rather than to catch after the fact?
Similar algorithms, similar data sets are being used also for sentencing guidelines. Judges are looking at these and essentially saying, “A person like you normally gets sentenced for this kind of crime, x amount of time, that’s the sentence I’ll give to you.”
Like what you we’re raising, Sam, this is a data set full of bias to begin with that African-American people are essentially punished much more for similar crimes than white people in America. If you’re looking at that history and saying, “A person similar to you will historically get this kind of sentence,” you’re also blocking in the system of bias that’s existed for a long time. You’re not making a good prediction of what kind of sentence will be rehabilitative, for instance. You’re imposing a racial legacy that we’re trying to distance ourselves from.
Sam: Yeah, yeah. Sometime back around in 2003, I spoke to a judge here in Multnomah Country in Oregon, I guess his name is Michael Marcus who had been working on a system that tried to do some correlations between sentences that were given and later [00:21:43] to see if there was any data that he could get out of that. It was supposed to be used as guide for judges to figure out how to sentence people. I thought that was really, really interesting. It seemed like it had a lot of promise. It’s a fun thing to run across early in my career.
Carina: Promise and yet very big flaws.
Saron: David brought up the point earlier about dealing with bias information. If you’re feeding an algorithm all this biased information, you’re going to end up with results that are biased. As a programmer today, if I have the opportunity, appreciating the fact that the product has many, many different pieces, and it’s built on years and years of research and work on the shoulders of past programmers. If I am building something like a video app, or a photo app, I have mountains of data on how to process fair skin but I don’t have data because of that historical bias on processing darker skin, what is my responsibility? Do I just throw out that information because it’s not fair in the end? Is it my job to somehow make up for that lack of information? How do I make product decisions in that type of situation?
Carina: I think in part the word how do I make product decisions or words, it ends with itself. You’re not the product manager, you’re the programmer. The product manager should be hearing this stuff and should be making the hardest decisions about how to deal with it. What we can do is say, “Here’s a list of ways I know how to deal with it. Let’s, as a team, discuss, find even more.”
One of the ways that we deal with these problems is by having diverse teams and by crowdsourcing to people outside of our team. Right now, Google is dealing with translation and accent recognition both by crowdsourcing. They’re using Mechanical Turk to crowdsource how to interpret Scottish accents. It turns out that that is where they fall apart completely. In fact, if you look on YouTube, there’s some hilarious videos.
Google also has an app whose name I believe is Crowd Source which is giving people examples of written and verbal language to interpret for them. Essentially, teach their algorithms how to be better at interpreting hard problems. This is a great way to extend your knowledge beyond the team and let an entire world of people contribute their data, their knowledge, their perspective to these problems. We don’t have to solve this all within our team. In fact, we should assume that there’s no way for our teams to be the first to understand a world of perspectives.
Charles: I want to reiterate a point here. I hear this fairly frequently, this question that’s basically, I see these problems and I have a hard time finding enough diverse people, or building a diverse enough team to solve these problems. I just want to reiterate that point, you don’t have to have those people on your team. It’s ideal if you have some of them, if you can find them and get them on your team. Crowdsourcing outside your team will help you compensate for the other biases that you can’t account for by having people on you team.
Sam: And whatever possible pay for the time that you used.
Carina: Yeah, Mechanical Turk pays a little penance, Crowd Source doesn’t. If you look at the reviews for the app, people do call that out fair. I eagerly did a bunch last night. There was a point which I said, “I just spent an hour on this.” I don’t need a lot of money but Google’s getting a whole bunch of value out of what I and all these other people are doing. They can afford a little pocket change.
Going back to your point about diversifying, I will say that I’m going to draw a hard line to say here and say you do have to have a diverse team. It’s not enough to give up and say, “This is really hard. I don’t see people have a diverse team.” You have to start there. Crowdsourcing essentially is additive, not subjective. You have to be able to solve a lot of problems within that room.
David: I was wondering if we’re going to let Chuck get away with saying, “You don’t have to have those people on your team.” He didn’t say those people.
Charles: I was going to say. Thank you for clarifying that. What essentially I was trying to say was that the ideal situation is you have the diverse people on your team so go find them. For the biases that you can’t find people that will help you account for those. As you said, there’s no way to have a team that’s going to be diverse enough to account for all of these situations. Make you sure you’re doing the crowdsourcing.
Sam: Carina, did you see about the article Next Door that came out this week?
Carina: Yeah, go for it.
Sam: Next Door, I guess, is a neighborhood watch program. It’s like Facebook, you can basically say suspicious person breaking into car.
David We also sometimes use it for like, “Here’s some stuff for sale.” That seems to be the majority use case.
Sam: Okay, yeah. I guess the problem with Next Door is that it’s used by people in geographical communities which are already reinforced by the big sort of the American population. We tend to live by people that look like us, talk like us, think like us. Next Door very quickly became an in-group, out-group place. It turns out that they’ve became rampant with racism.
People would just write things of suspicious black person entering car. That was the entire report for the neighborhood watch or for the group. The CEO of Next Door is Indian. It took issue with his company being a hotbed of racism. They basically said, “Well, screw it. Let’s AB test this. I don’t know how to fix this, but let’s try something. Let’s not do nothing. Let’s actually say we’re going to do this.”
They AB tested a whole bunch stuff, they crowdsourced their test results. They had humans saying, “Is this racist? Is it not?” They used it against adjusting data, used it against control group, the experimental group.
They ended up with just two things, one of them being a reminder about racial biased that tell you, remember, race is a thing when you post. They also put in a, I don’t know the word for it. If it’s a bad thing, it will be called control fraud. In this case, it’s control fraud used for good. They made it harder. Control fraud is when you make the process harder than it has to be, or with early.
The point is you take the process and you make it harder for certain use cases. This is control manipulation by manipulating the process. If they detected that you have reported something to do with ethnicity in your report, they immediately pop open a form that says you have mentioned race in your post, you need to supply at least two other identifying things. You can’t just say Mexican person getting into a car, you need to tell us what was he wearing, what was he doing, how tall, what was the age, what was the gender, if it’s a male or female. You need to supply more information about this person. If you were just saying dark skin person getting into a car, no, we can’t.
What they found was over 50% of reports that involved the keyword race got abandoned. The person writing the report dark skin person getting into a car, to said, boom “You need to give us two more details.” Boom, I’m just not going to post it.
Next Door’s CEO said, “Awesome, we didn’t want those posts anyway.” When are you going to find the content manager saying less content, less traffic is good. In the case of Next Door, that’s what it was.
Charles: We’ve talked a lot about how these algorithms see, or detect, or create these social situations around how we see people. It’s interesting because sometimes these algorithms just bring something up that trigger somebody that makes them sadder, or reminds them of something horrible that happened.
I think on Facebook there were issues where it brought up like in the past, so many years ago, or whatever. It showed somebody’s daughter that had passed away, or somebody’s dog that had passed away, or something like that. As you can imagine, it’d be fairly traumatic. You’re skimming through there and you’re seeing what your friends are doing, then you have this right in the middle of it that is emotionally jarring for you.
Carina: Facebook has particularly had problems with this and had a lot of complaints from people due to the yearend review which they do in December. The service that they started, I think within the last year called On This Day, where they look back and find some post previously and say things like, three years ago you did this, or here’s a picture of something you were excited about two years ago. Many people report that what it tells them to celebrate is, “Yay, three years ago, here is your dog.” That’s the dog that died three years ago. That’s very disgusting.
Apparently the yearend review, the notorious example was Eric Meyer. His six year old daughter had died, I believe of cancer. In yearend review, because those posts of course were so heavily disgust by his Facebook friends. As one would expect, when the algorithm is looking for things that had a lot of activity, a lot of comments, a lot of shares, unsurprisingly you’re going to find stuff that’s not just happy, but is profoundly emotionally affecting in other ways. In this case, it was not just very sad to remember, but it was something very unexpectedly being put right back in your face out of the blue. It’s not making a choice to go as an individual, browse back through the year and contemplate some of that stuff. It’s someone getting up in the morning and being, “Hey, congratulations, your daughter died six months ago.” Unintentional, devastating.
Some other examples, that same Flickr example included one content alert holocaust. It tagged a photo of Dachau Concentration Camp as a jungle gym. Again, you can be innocently surfing for, “Hey, show me some kids playground equipment.” Then suddenly get hit with a really unexpected, devastating image.
Interesting thing about this one, here’s another one of our little guidelines to look at. The photographer knew where they took the picture. The photographer had already tagged it as Dachau. This is a great example of make sure that we’re not assuming that machines are smarter than humans. You have to look to the native human knowledge that’s being provided and say how can we add to that, or what can we derive from that rather than what can we substitute for that.
David: That’s a really good distinction. Thank you for drawing on that. We talked early at the top of the show about image processing, identifying people as animals. I was waiting for you to say it took a picture of a dark skin person then identified them with, not misidentify them by species, good heavens, projected cultural bias onto them. Like took a beautiful picture of a beautiful black women and then put suspicious or welfare, something on there that projects white privileged bias onto people.
I was thinking about the distinction between when your machine learning just completely misses the thing based on the fact that you’re like Hewlett Packard only training against a certain subsets, a certain phenotype of human being. There’s also stuff that reveals our in-group, out-group biases.
I wanted to pick into that a little bit but before we do that, my question is this. Ignoring for a minute how we ended up with this data, ignoring for a minute how we ended up labelling a concentration camp as a child’s playground, what do we do? You’ve mentioned algorithmic transparency, this is what I wanted to touch back on is what do you do once you’ve put bad data out there? How do you know? It’s like yes I want to know how to keep the cows in the barn. As a programmer, I’ve just declared bankruptcy on keeping the cows in the barn ever. I always want to know how do we get the cows back in the barn? How do we tell where the cows are? How do we tell if cows are out of the barn? How do we know when we’ve screwed up? What do we do when we’ve screwed up?
Actually no, I’m less concerned about the incident management. I don’t want to get into how do we fix this one post. But like, what do we do when we find out the word labelling people as animals?
Carina: Yes. First, we have to be able to know that. We need to have the reporting before we can even address it. One good thing to do is while making an app for whatever it is, regardless whether it’s photos, or whatever, is make sure there’s a really easy reporting feedback mechanism where users can say immediately, “Hey, there’s a problem here. Just to want to alert you.” Make sure that you already have in place policies for dealing with common cases.
For instance, if there is an offensive image, Facebook has long written guidelines on what is an unacceptable image, how do you handle that. They have systems in place so they’re not having to be reacted, “Oh my gosh, what do we do here?”
Similarly as part of the process of deployment, make sure that deployment includes we already have in place policies for dealing with feedback. We know what kind of feedback we’ve gotten because we’ve done data testing that asked those questions. We found what kind of common cases occur. We’ve made sure that form has some sort of open text field to report cases that are not common, that we may not have anticipated, including things like Flickr probably didn’t anticipate that there were ways for the tagging to be incredibly off base and offensive. It is in some ways reasonable that they didn’t anticipate that. Isn’t it so much better to get that feedback immediately on scale privately and be able to deal with it swiftly rather than wait until there’s a big public blow up over it? Having that immediate feedback loop is so valuable.
Saron: I’m actually listening and taking it seriously. It’s one thing to have the procedures and the processes in place to receive it. But once you get it, do you listen? Do you get really defensive? Do you actually incorporate that into your future decisions?
Carina: Great point.
Saron: I wanted to go back to this idea of machines versus humans. I think a really good example of that was recently with Facebook’s news feed. For a while, they had an editorial team which curated that. There’s a lot of discussion about, “Oh it’s a human team, therefore it must be biased. There has to be issues around it.” They recently replaced it with just machines. Their team, it’s all automated at this point.
I think it was all this week, they had, was it three new stories that were fake and also not family friendly. I guess it’ll be a good way to put it in different ways.
Sam: As we’re recording this on Tuesday, I believe they fire their team on Friday, or they notice what they did. By Saturday, Sunday, Monday, several epic fails.
Saron: Yeah, just one every day. Three epic fails. Would you think it’s such an interesting example of trust the humans versus trust the machines? I’d love to just hear your thoughts and your reaction in that.
Carina: Yeah. You’re correct. Friday 4:00PM, they abruptly fire the entire team and gave them an hour to get out of the building. By Saturday, there were already notorious examples of metric fail, Sunday, Monday. There’s actually more than three, there were just three that particularly well reported upon in the news. It was a broader problem than that.
Essentially, circling back to what’s going on. Previously, they had algorithms to surface potentially good stories for Facebook’s trending today sidebar. Human editors would look among those and pick something based on again, Facebook had long written policies on exactly how to go about selecting, how to surface those. There was editorial control in the sense that having had the scaling problem dealt with, how do take all these zillions of articles and filter it down to a reasonable number that appear to be trending. How do we then select something of particular quality, reliability, interest factor, and make that the choice?
In May, there was an article criticizing that, saying that the humans actually were biased in their selections, that they were biased against conservative stories in particular. Facebook did an internal investigation. They said, “We don’t see any evidence of that. However, we’re going to move to completely algorithmic soon anyway rather than having any dependence on humans.” That was the decision that they carried out on Friday. We saw the disasters all out from that decision.
I think, Saron, you couldn’t have picked a better example of how that kind of arrogance. I think part of it is on programmer culture within tech of devaluing anyone who isn’t a programmer. They didn’t value the knowledge, experience, and human filtering of journalists. They add and their employ.
David: What I love is the Hue Bistro of we’re going to replace you with an algorithm we haven’t even tested but we thought about it and we’re sure it’s right.
Saron: Yeah. That’s a major decision because they’re not just some small start up where no one will notice if things are wrong. They’re freaking Facebook, that’s a really big decision.
Carina: There was an article by someone who’d been on the team. She said that it was habitual. One of the problems that they had as a team is that the engineers were routinely changing the algorithm without telling them. This was actually a historical trend for them. They were used to having no communication with the engineers, that actually had policies imposed upon them of you don’t meet with engineers, you don’t discuss with engineers, you do not have conversations with people outside your team, you’re just contractors. That isolation meant the engineers didn’t have that feedback and the human editors didn’t have that warning. It’s really easy to have a fail when there’s that kind of gap.
Then of course, the history or arrogance of like, “Hey we can change this anytime, no big deal.” It maybe that no big deal was that humans are compensating for problems in those changes. They don’t know, we don’t know.
Charles: That’s interesting. It’s really interesting, I think a lot of the time we think about our coding as just the vacuum that we work in. We’re sitting in our computer by ourself, or even if we have a pair, we’re pair programming. We’re sitting there and we’re just thinking about the code, we’re thinking about implementing the algorithm, we’re thinking about building this technology that does a thing. We don’t think about the wider social consequences of what we’re doing. It’s something that’s come up over and over again in the episode. At the same time, it’s something that’s easy to lose sight of because ultimately we have a story card on our Agile board that we’re just trying to get done.
Some of these things are pretty serious things. They’re widely publicized because they’re large companies doing it. How do we become more aware of this? We’ve talked about diverse teams but if I’m just working by myself on a project, how do I personally try and be a little bit more mindful of the social consequences of what I’m doing, even if it’s something relatively simple, or relatively small where I don’t have or involve a team?
Carina: I think what I’ve discovered is there’s no such thing as something that’s relatively simple. As soon as you’re dealing with humans, it’s always going to be more complex than what you think it is immediately.
Some of the things that brought up I think are applicable to that even there’s a one person shop, or particularly as a one person shop. Ask more people than are in your circle. Ask a lot more people, do beta testing that is as diverse as possible geographically, racially, gender, religions. Find as many possible axis of human diversity and make sure you’re hitting them all in some way even if it’s just a sample of a couple of people per, make sure you got a really broad representation of possible users. If you don’t, make sure the release is only to people who fit the group you’ve tested with. Maybe you can’t do a global roll out right away because you just don’t have the feedback yet to serve a global audience well. You don’t have want pie in your face later or certainly to alienate the entire group of potential users because they see this as something that is inherently offensive or doesn’t care.
That’s one issue. I think the crowdsourcing as well is potentially way. It’s one person shop that maybe really hard to do something like Mechanical Turk, expense to that can quickly blow up. We have avenues to reach out for knowledge beyond our own. It’s incumbent on us to try.
Charles: I have a follow up question to this. Just because I know several people that, what you’re saying makes sense. For them, because they don’t completely understand all of the issues involved with different groups of people or with diversity, or inclusion, they just feel like they don’t know or they don’t understand. They’re not quite sure how to approach other people about some of these things.
It feels like a lot of times people just hear, “Well, just get over it, cause it’s the right thing to do.” Are there ways to break downs those walls for yourself without it being overly uncomfortable or trying to break down a wall that you’re just going to throw yourself against over and over again? I don’t know if I’m asking this well. I know people that have these barriers that they put up for themselves. They just throw up their hands like, “I don’t get it.”
Saron: Sorry, can you rephrase that cause I don’t think I understand.
Charles: A lot of people will just chuck this up to diversity or inclusion is hard. Then, they just throw up their hands and it’s like, “Well, I’m never going to understand it so I’m not going to try even though I feel like it’s the right thing to do.” How do you get people past that?
Sam: I feel like the answer to that is kind of the same as the answer to some of the algorithmic fails that we’ve been discussing. The answer to both is really you as a human being and you as a software professional, you have certain responsibilities and obligations, you should have a certain set of ethics that require you to confront things even when they are uncomfortable.
Charles: I agree. I also just want to push a little bit on the it’s okay to be wrong. You’re going to figure it out. As you’re picking the stuff up, I think we’ve all made mistakes along this road. Just keep getting better.
David: Yeah. Diversity is hard but it’s also important. Never give up. It’s going to stay hard. It’s never going to get easy. It might get easier but it won’t ever get easy. It’s always going be important.
Saron: I just want to push back, this isn’t a diversity conversation.
Charles: No, it’s not.
Saron: Right, it’s not. This is about algorithms. We’re all affected by algorithms. It’s not an issue of, “Oh, let’s protect the black people. Let’s protect the women.” It’s not that. I just want to be very clear that there are plenty of examples we haven’t talked about. Sam, you mentioned in the messages about the Fitbit example, if you want to bring that up. There are plenty of examples that show that this is about all of us. We are all affected by it in small ways and in very big ways. Let’s just not forget that we’re on this together in many different ways.
Sam: Yes, thank you. I was thinking that the Fitbit example would be a much clearer example of where professional ethics come in. Carina, do you want to summarize that one since it was in your talk?
Carina: Sure. Fitbit is another one of these activity trackers. In its early days, it was a social activity tracker. The idea is share how many steps you’ve taken, what your current weight is, what kind of activities do you engage in with the ideas of ramifying that socially. My friends did a little bit more than me, I’m going to try harder, etc., some sort of motivators.
In its early days, Fitbit had a feature that included tracking your sexual activity which is again, as a sex educator, I’m kind of delighted. That is a legitimate physical activity that burns some calories. The question is did everybody intend to be that social ramification of it? Fitbit treated all data as equal. By default, they share all data including the sexual activity information. Not everyone was aware of that. Search engines started turning up people’s profile full of this information,I think that most people would consider private. For some people, they’re fine with that. I think it’s unreasonable default to look at all data as equal and say, “Opt out if you don’t want to share your sexual activity with the world.” You need to look at each data point and say, “What is the reason we’ll default for sharing this one, one by one.”
Sam: That’s one where I think the ethical obligation is pretty clear. Especially like in a culture where we have, at least the default expectation of monogamy. If somebody is outed by their data tracker as having had sexual activity when their primary partner was perhaps not involved, that’s going to have some interesting implications for your users’ lives. Whether or not we want to be enabling people to hide their affairs, that’s another question. As people with access to that data and deciding what to do with it, we have our responsibility to at least to treat it carefully.
Sam: To our listeners, we just want to give you a quick content warning that we’re about to discuss issues of rape.
Carina: Tying in the Fitbit example with the earlier samples, we discussed about policing. They actually have a bearing on each other. There was recently a case in which a woman had claimed that she was raped. Prosecutors ended up accusing her of falsifying that claim. They based that assumption on her Fitbit data that they didn’t believe she was raped because of certain data on Fitbit saying that her activity did not seem consistent with the accusation.
There’s a big problem with this in that Fitbit. First of all, anybody who’s used Fitbit can already tell you or any of these activity trackers is they’re not totally accurate. This was so extraordinarily proved in that earlier in the year. There was an example of actually being able to use a Microsoft Band, which is very similar to Fitbit, to get a heart rate of 120 beat per minute from a dead chicken.
Saying, “This is so reliable. We can use this to prosecute someone.” She ended up being sentenced to I think two years in jail. She actually was ordered to serve two years of probation based on that Fitbit data which we know to be very imprecise, not entirely reliable. It’s almost more for entertainment purposes than it is for precise data. Have that extrapolated knowing that that data can be that level unreliable, or it’s getting any heartbeat of a dead chicken, let alone something that’s in that table. You’ve got a problem.
We can’t start tying sets of unreliable data despite the fact that they’re drawn on, “Hey, lots of big data.” The technology can know better than the real world. We got to be really, really careful about allowing these things to be pulled together.
David: Big doesn’t mean accurate.
Charles: Alright, well. Is there any kind of parting thought that we want to wrap this up with before we go to picks other than just what we do has impact? Beyond just what the software does. It all interacts with the world and it can affect people in real ways.
Carina: I think you can count on it to affect people in real ways every single time. Everything that we do counts. It’s personal.
Sam: Before going for PC quotes, how about making the world a better place doesn’t necessarily mean finding more efficient ways to pick up your dry cleaning.
Charles: Alright. Well, let’s go ahead and do some picks. David, do you want to start us off with picks?
David: Yeah. I’m just going to be really quick. I’m reading 99 Bottles by Sandi Metz and Katrina Owen. I can’t pick this book hard enough. It is absolutely knocking my head right off my shoulders. It’s freaking amazing.
Get the book, do the exercise. Actually, put the book down and go do decoding exercise and you will be betrayed by your own biases. If you just skim over the exercise and get to the end of the chapter, you will go, “Okay, yeah. Well, that’s the thing.” You will completely recon your own behavior to not be biased and not do the exercises. They nailed me to the wall. 99 Bottles by Sandi Metz and Katrina Owen. It’s awesome. It’s so good, we need to have them on the show.
Charles: In fact, we should actually call that out. We are doing a book club on 99 Bottles. We’re going to be recording it at the end of October, which means it will come out sometime in November.
Charles: Saron, do you have some picks for us?
Saron: Sure, I have a couple. The first is actually just pretty backing of what David said. We’re reading 99 Bottles for rebook clubs. If you’re interested in reading along, you’re more than welcome to do so.
For my own two picks. One is the Browser Vivaldi, which I don’t know if all of you heard of but it is awesome. I installed it yesterday and started using it. It’s really, really good. I believe it’s by the makers of the Opera Browser and it’s based on Chromium. I think their tagline is the browser we made for our friends, or something like that.
The idea is taking a bunch of Google Chrome extension that they really liked and just things they wish our browser could do, just building it right into it. One of the things I love about it is the note taking feature. You get this nice little notepad on the left of the browser. As you’re going through different websites, you can pull out quotes and links. Just have your own organized notepad and it makes it super, super easy to do any research that you’re doing. I highly, highly recommend Vivaldi.
My last shoutout is Magnetic Sticky Notes. I think they’re created by the company called Tesla Amazing. I saw the demo for it. I seldom believe it works as well but I’m going to buy it and try it out. Basically, it’s sticky notes that sticks to anything. You can use the marker, pen, pencil, whatever you want. You can stick it on any surface, on stone, on wood, on plastic, whatever it is. It comes in little sticky note sizes, it comes in big paper sizes, you can put it on your wall. The back of it is made of whiteboard material. The whole thing is just magical. If you’re interested in that, check out Magnetic Sticky Notes. Those are my picks.
Charles: Very cool. Sam, what are your picks?
Sam: I’m just going to pick one thing today and that is the Oregon Shakespeare Festival. This is a place in Ashland, Oregon, South West Oregon, almost all the way down to California. It’s a little hard to get to from other parts of the country but it’s well, well, worth the trip.
They have a season that’s something like eight or nine months long I think. Basically, every time I’ve gone to see a play there, it’s always been the best production of that play that I have ever seen. We went there this year and we saw Twelfth Night, and The Wiz, and both were amazing. If you can manage to get the time, and travel out there, and you like Shakespeare, of course theatre in general. They do probably about half Shakespeare and half other stuff. I cannot recommend that highly enough, it’s great.
Charles: Awesome. I’m going to go ahead throw a couple of things out there that I’ve been working on. One is if you want the recordings for Ruby Remote Conf, those are now available. If you go to devchat.tv, click on Conferences, you can get the recordings for that. Rails Remote Conf is coming up. If you want tickets to that, you can also buy those.
I’m doing some webinars for the Get A Coded Job Books. If you’re trying to find a programming job, especially if you’re a new programmer, that’s the focus of these. We’re pulling those together. I’m doing a bunch with just the material from the book, and then I’m doing one with Joe Masty. He’s going to be talking to us about Apprenticeship Programs. I’m also doing another one with Josh Doody who we had on a few weeks ago. He’s going to be talking to us about Salary Negotiations. If you want a live discussion, presentation, with Q&A, and all of that good stuff, then go check it out. You can get all that information at devchat.tv/webinars. Carina, what are your picks?
Carina: Okay. Mine are two books. One is Howard Zinn. He actually holds series of his books. Principally of people’s history of the United States. He’s got a whole series around that. If you’re interested in getting a much broader perspective on people in the US, on history, because generally diversifying your perspective, fantastic books. Really detailed history. There’s that Maxim Histories Written By the Victors. These are the stories of people who aren’t the victors. Telling history from their perspective. I think that’s fascinating.
I am an avid cook and baker. I love Harold McGee’s On Food and Cooking, which is essentially an encyclopedia of food science. It’s not recipes, it’s everything you’d ever want to know about food science so that you can make your own recipes, amazing. I love it, super thick. It’s a reference forever.
Charles: Very cool. If people want to find out what you’re up to these days, go follow you around the conferences, and things like that, what do they do?
Carina: Twitter is where I’m most active. I use Twitter to blog. You can find my latest conference schedule on my website at cczona.com. My next couple of conferences are EuRuKo in September, Lean Agile Scotland in October, SCNA also in October, and November, GOTO Berlin. I’m pretty excited about all of those.
Charles: Very cool. Thank you for coming. It was really interesting just to think about the impact that our code has outside of our teams and our companies. Thank you for making us think about this.
Sam: Thank you.
Charles: Thank you for having me. You also brought up some examples I didn’t know about. I’m loving this.
Charles: We’ll catch you all next week.