Nov 6, 2023 · Episode 9

Like a Fitbit for engineering teams with Grant Jenks from LinkedIn

Grant Jenks is a Senior Staff Software Engineer at LinkedIn where he works in the Insights & Analytics team — or the Fitbit for engineering teams.

Show notes Transcript

Show notes

In this episode of Engineering Unblocked, Grant and Rebecca discuss how LinkedIn approaches the challenges of keeping its software engineers (and others) happy and productive, and how the Engineering Insights organization informs its work and the work of teams across LinkedIn.

Timestamps

(00:00) Introductions
(01:00) Grant’s current role and his career journey
(03:51) The origins of the productivity organization at LinkedIn
(05:22) From building tools to maintaining critical development infrastructure
(06:40) Incorporating commodity tools
(08:06) Choosing which problems to solve
(09:21) How the team’s metrics inform work across LinkedIn
(12:05) Using the metrics to help teams set goals
(13:30) Choosing the right metrics for the problem
(15:32) Unique user problems at scale
(18:07) Different problems and different perceptions for different personas
(23:09) Working with productivity champions at the team level
(23:40) Defining “happiness” and soliciting feedback
(28:05) Spotting trends in the sentiment data, and choosing the right cadence
(30:26) The product is productivity, and users can do surprising things
(34:00) Making change happen at scale
(38:33) Using metrics in a productivity emergency

Links and mentions

Transcript

Rebecca: Hey Grant, how’s it going?

Grant: Good. Glad to be here, Rebecca.

Rebecca: I’m so glad. It’s taken us a while to get this on the books. All sorts of calamities have gotten away, but I’m super excited for you to be here. It was months ago now, probably, that I reached out to see if you might want to come join us. It was because of a post that I saw on LinkedIn, which happens to be where you work. So good job!

We’ll talk about that post about umbrellas a little bit later on, but first, just tell me a little bit about you, how you arrived where you are, and what you do.

Grant: I do work at LinkedIn now. I’ve been here for about four years. I work in an org focused on productivity and we’re really excited about productivity and happiness — so generally those are aligned, but not always — and specifically within that, I focus on a lot of developer productivity, although you may also hear me use the term worker productivity. So there are all kinds of legal nuances between full-time, vendor, contractor, and whatnot. Basically, we focus on anyone who works at LinkedIn in some capacity and in some way. The productivity org serves folks in engineering, sales, HR, support… all different parts of the org. But generally focused on developing software for those different areas.

So how I got here? It’s kind of strange actually, I haven’t always done productivity. I worked for myself. I had my own company before this. I worked in advertising and analytics, which is interesting because here I feel I still get to do analytics.

Rebecca: I was going to say!

Grant: I do a little bit less of the advertising, although it taught me a lot about people’s mindsets. Advertising works more powerfully than most people suspect and even smart people — like developers who tend to really exist in their minds — it works particularly well on them. Before that, I worked at Microsoft and I was in an incubation project as a compiler engineer. What’s neat about my current role is that I have a set of experiences in compiler and tools engineering — really focused on developers — and a set of experiences in advertising and analytics, and I have found the intersection of the two.

In this role, I do a lot with what we call insights and analytics, and we try and make teams — one of the analogies we use is we’re like a Fitbit for engineering teams, so trying to make teams — happier, healthier, and more productive.

Rebecca: And giving them that feedback — maybe not literally on their wrist, but still.

Grant: Yeah, or not a wearable, but we’ve thought about it.

Rebecca: I was going to say, have you thought about it? It could be a whole new market. Wearables for developer productivity. It’s so interesting how all those different experiences can add up to this “Oh, I fit here exactly!” even though I didn’t necessarily prepare to fit here exactly.

So that makes a lot of sense. I’m sure that this work was going on before you got to LinkedIn, but what do you know about how this came to be — especially the whole productivity organization? I’ve seen productivity within engineering — enablement is another word that I’ve heard — when you’re talking about things that span perhaps beyond engineering… but a whole productivity org? I love it, and I haven’t seen it.

What do you know about how you got there?

Grant: If you go far enough back in LinkedIn’s history — and we still have a bunch of things named this because names tend to stick around longer than people, longer than initiatives, longer than anything else.

Long, long ago, we were a tools org. So, if you shrink us down enough and go back in time, we built tools for everything in the development pipeline. We built tools for customizing your IDE, tools for code collaboration — how we did code reviews — tools for the CI pipeline, tools for deployment. Tools, tools, tools.

And I think after that, it morphed at one point and was rebranded as Foundation. So there was this core engineering aspect of it where we thought “Hey, in order to ship, you need all this stuff working.” Developers aren’t still using all of these things just as tools. Developers now rely on this as critical infrastructure.

When I first joined, I didn’t join the Insights and Analytics team — it didn’t actually exist quite yet in that form — I joined the CI Tools team. One of the funny experiences there is that to sometimes get ourselves unblocked — if there was a CI tools incident — we needed to get something through CI, but we were the team directly responsible for CI. So if we had borked it, it was like “Oh shoot, we need to go smash the glass and merge this right now,” and drop back to the old tools commands where you needed the expert or the most senior person who is like: “I remember all these commands, out of the way. And I have some super user credentials that will let me smash the glass and do this.”

So we started as Tools and morphed into this Foundation/Core Engineering. From there we moved more toward a specific focus on developer productivity and happiness, and then from there, iterated a little more broadly toward just productivity and happiness.

Rebecca: So you joined four years ago when there was a lot of commodity tech around a lot of the things that you just said. But LinkedIn has been around for a while, and when it started, a lot of this stuff didn’t exist. How much are you even still today your own custom ball of greatness? Or have you managed to bring in commodity, off-the-shelf tools for some of this stuff? I’m just curious what that looks like because I think that’s always interesting when you have a company that is of a certain age that didn’t get to just buy something off the shelf in 2023.

Grant: The market dynamics have certainly shifted dramatically over the last 10 years. It used to be things like having a centralized code review that was a competitive advantage. And now with things like GitHub, you look at it like those are table stakes — everyone has that.

We’ve seen the same kind of transition with CI where everybody’s using GitHub Actions. If you do anything less than that you don’t seem to have any competitive advantage. You just have tech debt. I would say that we’ve made a lot of progress in that direction and we have tried as much as possible not to fall behind the curve when it comes to what people are doing in open source, or at other companies.

That said, I think one of the challenges here is that some companies operate very differently. If you go all in on things like a monorepo and you’ve got a billion lines of code in there, you have a dedicated team to make that work. We don’t see that in open source very much, right?

Rebecca: Right.

Grant: You’ve gotta be able to break things up and it’s debatable to what extent that is a killer feature. You really seem to transition. You can have this set of problems or you can have that set of problems. How do you wanna fund it? We go back and forth on that a lot internally. There are vigorous debates around, whether we are making the right set of trade-offs, or whether we can fund an alternate universe where LinkedIn looks blank internally. I think those are part of the growing pains that any company goes through.

Rebecca: I’ve always found it interesting how those early decisions can impact the choices you have today, and it’s interesting to hear that you’re trying to embrace modern practices.

It’s super true that you can’t go look in on what somebody else is doing with their billion-line monorepo, right? That’s not just sitting on GitHub for you to peruse through and see what you can see.

How does the work that you do around gathering metrics and evaluating happiness inform the work that gets done, and who does it?

Grant: That’s a great question. So it’s definitely not us doing all that work. At the beginning of this team org now, I said two things: 1. I really didn’t want to be Clippy: we don’t want to build this product bot that’s “You look like you’re trying to do blank, can I help you?” Both tool and service owner teams and individual developers would feel irritated by that. They’re like, can you just make it better, don’t bolt something onto it.

And then the other analogy I wanted to avoid — and I’ve never worked in a police department, but I’ve watched enough movies that I know that they have something called internal affairs, this is the police of the police of the police — and I didn’t want to be that either. We can’t run around saying you’re out of line, your numbers aren’t good, we’re going to tattle on you, or we’re going to hold you accountable… When in reality we’re peers with these tool and service owner teams.

So really what it’s looked like for us, by and large, is a high-touch partnership engagement model where we need them to make some commitment up front that they’re going to value the work that we do, and we need to make some kind of commitment to them ourselves: do we think we can deliver on the questions that they’re asking?

We have a pretty easy litmus test for some of that. Generally, a team comes to us “Hey, we’re trying to do blank” and we’ll just come to them and say “Okay, let’s jump forward in time and imagine we come to you and give you data that says you should do “blah.” Are you going to do it?”

And if right there, they’re, “No, we would hope that you come back and tell us to go do this instead.” We’re like “Well then, have you basically made up your mind? Because if you’ve made up your mind, it doesn’t sound like you need us.” We’re here for the ambiguous problems.

Now that can easily get overridden. People will still be like, “But we still need those numbers, we’re pretty confident this will be helpful.” There’s some scale here. If it’s fairly low-cost to go and get those numbers, let’s just do it, but if it’s really high cost and it’s involved, we have to make the right trade-off.

Rebecca: How do you — or do you — help those groups set goals? Did they use your metrics to set goals around how they want their tools to be received, and how they want the experience they’re building to be received?

Grant: To tell you the truth, here’s one place where the time dimension plays an important role. It is almost always the case that they come to us first and they have a set of metrics. They arrive, they have a document — multiple documents — they have OKRs, they have something that says “This is the thing that’s most important.” And to be honest, it’s usually terrible.

It’s usually something like, “We’re going to measure the number of builds that happen because we’re in charge of this build technology.” So if it happens a lot. Then our team is important or — I don’t think that’s the way you want to phrase it, I’m being a little dramatic here — but typically we classify some metrics as business metrics where it’s probably worthwhile to track things like engagement and usage.

But if you work at a company like LinkedIn, we have one system for code review. People aren’t choosing you over competitors. They’re coming to you because this is your only option. So we go back and forth on that a little bit and usually try and come to something for us.

A productivity metric is much more worker-focused, and developer-focused. It’s like, “Look, developers have a problem that they need to get code merged. How are you helping them with that problem?” You can choose a big platform like GitHub, and that’s been a great platform for us. But you have to keep focused on your problem. Don’t just become focused on your platform.

Platforms, build systems… these are all implementation details at the end of the day. We trust you — or here I’m speaking to the tool and service owner teams — we trust them to make the right decisions. If what’s best for the company is for you to build your own bespoke thing, do that. Ultimately they will be held accountable for those decisions.

We try not to make those, but it is usually a back-and-forth haggling, negotiation, and agreement. There are multiple iterations and it usually goes down the hierarchy first; somebody turns to a Staff Engineer, Senior SWE, Junior SWE: “Why don’t you think about this?” Then they come back, and engineers are often fairly terrible at this. They’ll always come back and say: “Uptime is the most important metric”. But that one’s table stakes. You’re not measuring the direct impact on a worker in the task that they’re trying to get done.

Then it usually rolls up and we get to senior leadership and they’ll ask broader questions like, “Is this really even a thing? Do developers have to do this?” Or sometimes they have really strong ideas like “I want to measure this very specific thing”. And you say “Well, it’s part of a bigger picture.” Usually, we have to go up and down the hierarchy, forward and backward in time, and then we land in some place that we feel we can measure that and it’s really meaningful.

Rebecca: I have had the blessing — or curse — of working at places large enough where I know intuitively some of the problems that you’re talking about. But for people who — were like me ten years ago and had never been exposed to these kinds of problems, can you talk just in some specifics when you’re talking about the folks who own the code review system. What are some specific user challenges that you see in that system that that team has to work to address?

Grant: One of them that’s been — I think subtle — in the code review system, and is common to see in research and literature, is that people measure things like how long it took this PR to get merged; we call that total time. Or how long until the first response to your code review? How long until a reviewer gave you some initial feedback?

When we’ve looked closely at the system, one of the things that has stood out to us — that’s a little different — is around notifications. You may measure things by how long it took the PR to merge, but what happened after that? Was the developer notified? Like, your code merged, it started CI, CI passed, you’re good to go, and it’s rolling straight to production. Did they receive all those notifications? Or was there some delay in the system? Did something break at some point, and how quickly were they told this broke?

Oftentimes with code reviews, we have to go through multiple rounds. Like, I tell you, “I need you to refactor this,” and then you refactor that. How long until I was told you refactored that? If another developer comes in now and reviews the code and brings up a different issue, is that helpful? Trying to track all the different notifications that could go off, and getting the right people involved, has been a real challenge in building a code collaboration system that feels genuinely collaborative, not just “Hey there’s SOC compliance issues, that means someone has to look at everything so we're just checking the box.” We believe that it’s more than that; it’s not just a box we check. It’s something that is a learning experience, a quality experience, and all these different things. So let’s make sure that it functions that way.

Rebecca: A lot of what we just talked about just highlights that at a certain-size company, you get a lot of advantage out of making these small improvements to systems that everyone uses, right? I’m guessing that a lot of your energy goes toward identifying those opportunities. Do you do anything that focuses on helping people on the individual team-level, helping teams be more productive, and improve their processes, or has that ship sort of sailed and your focus is much more on the org as a whole?

Grant: No, that’s very much part of our focus. Any time we engage a frontline manager, a senior manager, or a director, one of their first questions is, “Is the team working together well?” There’s this great example. We built this tool called Insights Hub which was this productivity portal that has a set of dashboards and curated metrics. We started trying to use it ourselves — the first thing we should do, use it ourselves — and then we started seeing certain patterns you can identify. Portions of the org where, for example, build time is really long, and you start asking yourself like, well, what’s happening there? And it’s funny because when you talk to JavaScript engineers who are mainly working in the browser, and they’re used to a very fast iterative feedback cycle, they’re gonna refresh the page, and if reloading the page after they hit save takes something more than a few seconds — like if it takes 20 seconds — they are ready to flip their desk and be like, “This is completely broken, I am totally unproductive here!”

And then you talk to mobile developers who are working in Android or iOS, and they describe build latency like “Yeah, I come in in the morning, I get a fresh build. That first build could take an hour, and that’s pretty good because it used to take two hours.” And you’re like, “Oh my gosh, an hour to do what? That’s insane!”

We have a lot of mechanisms to try and improve that. But you start to realize that like across the org, there are very different standards and expectations of what people are used to. If you’re a senior leadership, senior director, or VP, and you’re just looking at the numbers, it doesn’t really paint the right picture.

You’re like: shouldn’t everything happen in five seconds? What is it about this over here that can’t happen in five seconds? You’re telling me that those developers that wait an hour are really happy, and these developers who wait 20 seconds are willing to leave because they say 20 seconds is too much? How am I supposed to make sense of this?

Rebecca: How do you make sense of that?

Grant: Trying to contextualize it. One of the ways we do that is a framework called Personas. We have a persona called web developer — front-end web development. We have another persona for an iOS mobile engineer, and they have very different expectations. Calibrating that and communicating that through a central portal is really important. There was a great example too, on a team where a frontline manager was looking at people’s productivity and they were looking at people’s experience and they noticed, “green, green, green, red.”

And they’re like, huh, what’s that? I know that that person’s working on mostly the same stuff. So what’s happening? And they stepped in, “Hey, I noticed you’re having this behavior. Is everything okay? What are you doing?” But in this particular case, they went to them and that developer had actually just started on that team. They had a long tenure at LinkedIn, but they had just started on that team three months ago, and they said, “Well, I'm following the runbook, I’m following the onboarding docs, and I don’t know, your team seems strange? Like five-minute builds seem to be the norm.” And they say, “Well, that’s funny, ’cause I’m not seeing that elsewhere. So let me pair you with somebody who’s been on the team longer.”

And immediately they started discovering, “Oh, I wasn’t doing things the way that the rest of the team was doing it.” They had come more of like a backend Python development, and now they’re working in a Java space. So they had fallen back to some of the higher level commands and they didn’t know like, “Oh, I can invoke Gradle and I can turn off these phases, and if I’m trying to get that rapid iteration, that’s what I want to do. And there are some basic IDE settings I need to flip on.” It’s amazing how it can be a set of small things. Like “there are 20 additional characters for you to put in your command line that will make you five times iteratively faster.” it’s easy to overlook those things.

Through some of the portals we’ve developed, we think of those teams as one of our primary customer focuses. Within those teams, too, we have this concept of something called a productivity champion. So if the team is of a large enough size, there’s typically someone on the inside who says, I’m going to make things better, and we typically just need to empower them. If they come to us and ask a question, we should put that right at the top of our list. They have really good insights or they’re very close to the problem. So let’s just get them unblocked and see what they can do for the broader org.

Rebecca: I promised that I was gonna ask you how you measure happiness. What does happiness mean to you, and what does it mean in this context, and how do you quantify it?

Grant: We get a few different signals for happiness, but the biggest one by far — it’s just a qualitative signal — we ask developers through surveys and feedback: Are you happy? Do you feel productive? Generally, nine times out of ten, what ever quantitative data we have, will agree with the qualitative.

It’s fairly rare actually for developers to say, I’m really happy, and for the numbers to be “No, you’re having a terrible experience,” and vice versa. If the numbers say all your builds are incredibly fast, they come back, “I'm so unproductive here.” Typically we want to listen to them actually before we listen to the telemetry.

The simple answer to that question is we ask people. The complicated answer is how do you ask people? So we have a set of surveys that go up periodically — they could be quarterly, annually, bi-annually... We have a set of in-product feedback, so when you’re in a web portal, like you’re trying to manage the permissions, say, for a system, there’s a little feedback widget on the side of your screen that says, “Do you want to tell us right now how we’re doing or how that worked?”

We also have a set of instrumentation points where we say “Oh look, you just completed a workflow.” So, we might solicit feedback out-of-band — we call this “out-of-band feedback”. We might go to Slack and say, “Can you tell us how that CI workflow went? Are you happy with it?”

We might go through email or something else like Teams and try and solicit feedback. So lots of different sources of feedback from folks, surveys in product, out-of-band, and then we’d have to try and aggregate all of that. So there’s a metric design question aspect as well. At the top level, we have net satisfaction or overall developer satisfaction. And then on a per-service or per-tool basis, we try and develop customer satisfaction scores.

The one trick there that I think is very important: people will often come and they’ll say, “Well, my team is in charge of these five different things. Why don’t we just survey people, whether they’re happy with my team?” And that’s not what you want. We're not trying to build some kind of peer review system or something that could be toxic where you’re like, tell me how I’m doing. Right? We’re not trying to build Uber where you rate your Uber driver. “Well, I give you three out of five stars. I can’t believe you ran that red light. That was so dangerous.”

So we have to shift the conversation to, “Well, can you tell us about more specific things? Tell us, was CI fast? Are you satisfied with how quickly the CI completes? When you get an error message, are you satisfied with how quickly you’re able to understand why that error message came?”

Then we also cross reference that feedback with other indicators — here being like support tickets. If some system goes down — the one that people will notice very quickly is that if, for whatever reason, you can’t check out the code — a ton of support tickets open up all very quickly. We need to go and flash on different portals and maybe send emails and let people know, “We’re aware of an incident, we’ve already dispatched people, please stop filing tickets.” We’re just going to mark all of them as duplicates of the first one.

One of the more interesting things is when we can cross reference those things. If we find our availability around this system was really low last quarter and feedback related to that system was really low — like developers telling us it was frustrating. You might have only one developer tell you that, but you have to multiply that by 50 or 500 where you’re like, “Oh, actually, we can see a lot of people were impacted by that, so let’s amplify that feedback.”

Rebecca: How often are you looking at that sort of data? You’re collecting some feedback data in approximately real-time with the button that people can click and then there are the surveys, etc. How often are you looking at that data and trying to see — I’m assuming if you’re looking at it over time, you’re trying to see — trends? Are we doing better? Are we doing worse? How do you use that data responsibly to evaluate yourself and to decide what you’re going to do next? Especially sentiment data.

Grant: By and large, I would say it just ends up being quarterly. A lot of it will align with the survey that we run and the survey is periodic. People align their planning cycles, managers make room for it in their schedules, and they’re like “Okay, we’ve got to gear up: we’re headed into figuring out a roadmap here, let’s go listen to people and see how big an issue is X, Y, and Z.” So, quarterly. But there are certain areas — one area — that we’ve had to refine and change. It is when you have something that you’re like, “We’re really not going to change that. We hate to tell you, either that system is getting sunset or it’s a legal requirement, and we’re sorry, we’ve evaluated the landscape and there is nothing better we can do. Nobody’s ever going to look at that feedback. Let’s stop collecting it and asking you about it.”

Then on the other end, there are people who — or teams where they — are developing something brand new and they would love to get feedback every week. “We just released a new feature. Tell us how it’s doing.” Oftentimes, we don’t quite have the scale actually to make weekly feedback meaningful, so monthly or quarterly is much more substantive.

In metrics design, one thing we focus on a lot is providing confidence intervals or air bounds. So we might say, “Oh yeah, the satisfaction score was blank, and we think it’s plus or minus blank. For us, I don’t know how to map this to letter grades, but if you get a B or higher — think 80 percent of what’s possible or higher — you’re doing pretty well. If you're in that C–D range, it’s worrisome, like “Oh, that's not what we want.” And then anything below D you’re like, “Oh, that’s not good. We probably need to do something here soon, sooner than later.” I think that kind of makes sense.

Rebecca: I did want to also come back and talk about the post that you wrote that got me interested in talking to you in the first place, because I thought it was just such a great analogy for the challenges in this space and for the fact that, ultimately, you’re building a product and the product is experience, the product is productivity, right?

In that, you talked about giving people umbrellas and then learning some fascinating ways that they decided to use them. Talk to me a little bit about your experience with thinking about the space as a product with users and maybe some of the surprising things you have learned from users of your product.

Grant: There are so many examples. I’m reminded this morning of one of these webcomics where there’s some bug in the software and the bug causes someone’s machine to overheat and they fix the bug and then they get an error report later where somebody is like, can you actually put that back in because I was using that to heat my room and I’m annoyed that you took away my bug.

We know that in software whatever you expose to the real world, somebody will come to depend on it. It’s not enough often to be feature-for-feature compatible — you have to be bug-for-bug compatible. Whatever mistakes the existing thing made, you must copy those as well. We’ve seen that a lot in terms of what people do inside build systems. I think it is one of the places where we often have to kind of untangle things. It’s like, “We gave you a build system, not a make file, and I know we can tell sometimes, you had more experience with make files, and so you did a lot of things that are fine in make files, but they’re going to ruin like the cache-ability of your bill. They’re going to ruin hermiticity. They’re going to ruin our ability to stamp things and be confident that that’s exactly what happened.”

We were just joking the other day, we should check after we run through CI that none of the source code changed while it was running. One of the more senior engineers immediately started to get flush in the face like “Are we really going to do that? Because I’m sure things will break.” We thought, that’s not good, and I don’t know if we want to look under that rock, right?

Certainly, build systems are like that. Deployment systems can also quite often be like that, where to get this deployed, I have to warm this cache, and before I warm that cache, I have to make sure these configs are in place. And to get those configs in place, I have to get my secrets and...

There tend to be these long, “Oh, theres a set of requirements here, and when I was setting things up, I took a shortcut.” And you’re like, “Oh, okay, well, shortcuts are problematic. We understand why you did it at the time, but now, all the metrics we measure here are wrong for you — we need to undo that. Or you’re missing out on this new thing we developed blank and you can’t use that. Like you can’t do auto-remediation, you can’t do auto health checks because that’s not how you built your system, and we want you to come back into the fold.”

Those are the two areas that give me the most, “Oh, gosh, I remember some of those issues.” But honestly, everywhere. This is the nature of software development; anything you expose publicly or to external customers, they will come to rely on and it’s hard to evolve over time.

Rebecca: I want to ask one or two more questions before we wrap up. First, in my experience, a lot of this kind of work involves change and getting people to change things. What you just said is an example of trying to get somebody to change a very particular thing and change it once. There’s also the change where you need someone to start acting like this instead of acting like this. I need you to have this behavior instead going forward. What have you run into as far as trying to make change happen? Any tips for people who are trying to make change happen in this space?

Grant: We spend a lot of time thinking about the time dimension. We are here now, we want to get there, and we need to do that through a series of steps. Critically at LinkedIn, we’re at a scale where we can’t just flip a switch. Be like, “Everybody, take next week off, we’re going to go do a big thing and when you come back, it’ll be good.” There are cases where sometimes we do flip a big switch, like we’re going to ask one of the infra teams to work this weekend and they’re going to swap out how this thing works and we think we’re prepared to do so.

The thing that’s missing though — even when you do big switches like that — there is always that developer muscle memory, right? Long ago, I worked at Microsoft, and on Windows. I still type dir instead of ls sometimes. And I can’t believe this, for 10 years, I have known ls, but still, sometimes my brain’s like dir. So when you do those big switches, you miss out on that developer education experience where you’re like “Oh, we need to socialize this. We need people to be excited about it and talking about it bottom-up, and top-down saying this new way of doing things is great. We’re so excited about it.”

What’s important to us as part of those step stones? Obviously creating automation or putting checks in a place where we say, “Okay, stop going in the wrong direction.” That’s the first thing we need to stop going in the wrong direction. Actually, I’m getting ahead of things. Typically, we try to paint a picture where here’s the right direction we want to go in. Can we get everybody excited about it? Once everybody’s excited about it, we’re going to put limits in place, so you can’t go in the wrong direction anymore. And then we typically move from there toward automation that’s like now we’re going to rewrite things so that they move from the old style to the new style. And then there’s always that long tail of things where it says, “Okay, for these hundred projects remaining, we’re going to need to engage a human. We’re going to need to get people involved.”

We have a whole process here at LinkedIn called horizontal initiatives. Sometimes it’s as big as needing a horizontal initiative. So those engage the TPM or they’re more explicitly called out to senior leadership in terms of planning. We might say, “Hey, this needs to get carved out of everyone’s time budget or planning budget for the next quarter.”

There are also some healthy limits in horizontal initiatives. Like, you can’t do a whole quarter of just horizontal initiatives. That’d be crazy. So we have to cap it at something. I forget what the cap is, but it’s pretty low, like 10% or 20%. So if you think of a quarter as 12 weeks, basically, it says that no more than one week can be spent on this. So that’s where we get motivated. The automation has to be really good because we just don’t have that much time from people.

Trying to paint a bright future, get buy-in on that, stop bleeding, have automation move people over as best as possible, and then finally close out the long tail. Honestly, sometimes what happens is we just can’t get rid of things entirely. If you get 99 percent of the way there, that’s where you declare victory. There is one percent that will forever remain on the old style, and that’s okay, we can’t do that. When teams come to us and say they’re going to do a hundred percent of this, we’re like: “Let’s reset your expectations. That’s probably not possible and probably not worth it.”

Rebecca: I want to wrap up here because we’ve been talking for quite a while, and it’s been fascinating to hear about what goes on inside of LinkedIn. I’m curious, can you tell me some particularly transformative project that you’ve been able to — even if it wasn’t one that you inspired, but some particularly transformative project that — wouldn’t have happened without this metrics and insights approach?

Grant: I don’t know that I could point off the top of my head to something where I say, “This would not have happened,” but one of the biggest examples I can point at — that I think we’ve had a really great partnership in and has been really transformative for development — has been this idea of what we call remote development.

So this is development using something akin to GitHub Codespaces. If you rewind to COVID, one of my favorite stories here is that everybody has to go and shelter in place and work from home, and senior leadership is asking questions like, “Is everyone still productive? What is the toll of this on our workforce? How are people doing?”

We know that psychologically people are in shock. Some people are sick. Some people are dying. There’s this sense of panic. And I remember very distinctly this one chart, suddenly the build times of one group of people just shot through the roof; they are having a horrible experience. What on earth is happening?

As we dove into it, we realized that it was two different groups of people: one group of people had a ton of dependencies as part of their build and tried to download all of those. Suddenly they’re moving development from the office where they maybe had a workstation to their laptop.

They’re moving from extremely fast networks — 50 megabits per second, or I don’t even know if I got that number right, 500 megabits per second — down to spotty connections that are going through apartment walls. They might prefer or use LTE typically at home because that works best, not the WiFi. So we saw them spike up.

Then there was another group of developers who frequently would create huge bundles — might be gigabytes in size — and they upload them. “I have to distribute this to a system that’s going to do a bunch of things in the backend.” And they’re sitting there “Well, I have a fast internet connection, but my upload speed is horrible, I’m getting 100 kilobits or something. I’m now waiting an hour to upload things, and when it breaks. I start all over.” It’s insane. Because a lot of these systems had been built in environments where we don't need to retry that, it works the first time. I did it, it’s fine.

So one of the things that put a big press on was to say, we need these remote environments to work, so people can develop as if they’re at the office, but they’re not. We needed that yesterday. Open as many VPN connections as you can, scale all this stuff, and move — we use Artifactory internally — move Artifactory at increased points of presence: you need to put one in India, one all around the world. Anything you do, you need to be able to access data fast and we’ve got to close the last mile gap.

I have a set of charts that I probably can’t share, but you watch these times spike up, and then as we add points of presence or we get developers shifting on, you watch them come back down and then they start to pass where they were at before you’re like, “Oh my gosh, now the system is faster than it’s ever been. And when they work in the office, it’s even faster. When they work from home, it’s really fast.” This is a huge step function in terms of overall productivity — that’s a phrase we use a lot. We’re always chasing after the 10X step function, and I feel like that was something we were able to contribute to really well.

Rebecca: To have all of the metrics in place — I’m sure you didn’t have all the metrics in place that you wanted that day, that week, or that month, but to have that observability in place — so that you can react rapidly to, it is pretty huge.

Grant: It very drastically changed roadmaps and reset priorities within a month. It was like, “Oh my gosh, we have to go do this, this is a smoking gun. Chase that problem down.”

Rebecca: I remember where I was working — at the time — there was a team that was dabbling in remote development, maybe this will be useful someday! And suddenly, they were everybody’s best friend.

Well, Grant, this has been a treat just like I expected. Thank you so much for spending almost an hour with me to talk about all things productivity at LinkedIn. This has been great. Anywhere people should find you on the internet or see you give a talk?

Grant: LinkedIn, obviously, feel free to connect there. I try to make a habit of posting there fairly regularly. If I have an upcoming conference talk or something, I’ll certainly share it with my LinkedIn audience first.

Rebecca: I guess that was kind of a gimme that we should find you on LinkedIn. Alright, thank you so much, and I hope to run into you soon. Take care.

Grant: Thank you, Rebecca. Take care.

Follow Engineering Unblocked on Apple Podcasts, Spotify, or in your favorite podcast app.

Have a question or feedback?
Drop an email to [email protected]

More episodes

What makes a Platform Team work · 49 mins

→

Pulling off a multi-month tech debt paydown project · 43 mins

→

The evolution of data engineering in the age of AI · 33 mins

→

Brought to you by

Helping you create better software development organizations, one episode at a time.

Get in touch

[email protected]