Managing data platforms for 600 analysts | Tom Grace | Data Science Hangout

Transcript#

This transcript was generated automatically and may contain errors.

Hey there, welcome to the Posit Data Science Hangout. I'm Libby Herron, and this is a recording of our weekly community call that happens every Thursday at 12 p.m. U.S. Eastern Time. If you are not joining us live, you miss out on the amazing chat that's going on. So find the link in the description where you can add our call to your calendar and come hang out with the most supportive, friendly, and funny data community you'll ever experience.

So today I am super excited to introduce Tom Grace and Darren Cope from HMRC. Let's start with Tom. Tom, I would love it if you could introduce yourself, tell us a little bit about your role, and maybe a little bit about HMRC in general.

Okay, hello. So yeah, I'm Tom. HMRC, for those who are unfamiliar or from outside of the UK, is the tax office. My role is as a tech lead on the Posit platform team. So what we do is we build the Posit installations. We have about seven or eight of those now of Workbench and Connect. We build those, and we make sure that they are shaped the right way so that the data scientists across the organisation can do the best work they can. The tech lead thing is it's partly technical, but it's also quite heavy on the culture aspect of leadership. So it's making sure that our team is a good team to be on, and also making sure that we are approachable and can be, and find it easy to be empathetic with those who are using the platforms and systems that we build to win.

I think that the coolest thing that you said in that whole thing was that you have a Posit platform team, which means you have a team of people dedicated to making sure that people can use the tools that you have and know how to use them. I think that that's really rare and kind of amazing because I can hear from people in the chat, I'm sure all the time, that we get tools from vendors, from suppliers, and then they kind of come to our organization and die out. Nobody knows how to use them. They're not very well supported. It really takes an internal effort to make sure that that happens.

Oh my God, it works. I am on my phone. How thoroughly modern. All the devices fail me.

Oh, hooray, hooray. This is the wonders of working in the office. Tom has fortunately worked from home and so has wonderful Wi-Fi.

I was going to say that would be fantastic because I know that Darren is a Posit subject matter expert, which is a very unique role. I'm not sure I know any other Posit subject matter experts at other organisations. Tom, what is Darren's role on the team?

So Darren is part of, he sits between the platform team and also our fairly new Posit adoption team. And it's his job to understand the nuts and bolts of what it is people need to do on the platform. So the platform team themselves are coming at it from the perspective of, oh, we've got some servers over here. And Darren's coming at it from the perspective of I've got some data sets. How do I get access to those? What do I want to do with them? What kind of problems are coming down the track? How can we use this better? Are there situations where people are using legacy tooling at the moment where with a bit of support, they could move across to something that's a bit more modern and easier to work with? So it's a greater focus on user need rather than the technical aspect of it.

One thing I heard from Darren was that when I talked to him earlier this week, he said, I just came out of a coffee and coding session for people who want to use Posit tools and have never used them before. And he had over 160 people in there, which is wild. That's as many people as we have on the hangout call right now. So all those people from your organization who are interested in using different tools, who get the chance to have an empathetic person sitting there and saying, like, let me just walk you through how this works.

Yes. I mean, it's a very large organization. We pretty much we've got a whole number of everything users. There's a fair number of SAS users. There's, as you'd expect, shed loads of Excel users. But probably pretty much any large enterprise tool that you can think of, there's a fair few users of that. There are people who are perhaps multi-decade SAS users who would be interested in moving across. But it's quite a learning curve for them.

Tools, data, and scale at HMRC

So for people who are not in a tax focused area and maybe don't have the experience of that, could you tell us a little bit about maybe an example of a type of problem you tackle, the type of data that you see in the tools you use?

So yeah, I don't do any data analysis myself, but it's the organization covers everything from things like import and export duty. So there are teams who are doing things like running analysis on bulk corporate returns, or they are looking back at the historical data, looking for patterns. There are teams who are modeling proposals for the government around, you know, if we introduce this, what might that do? There are teams doing loads and loads of different things, and it is a huge gulf of different things. But we have using just Posit, I think we are edging towards about 600 analysts and they are in a huge variety of areas and also experience levels or kind of focus areas. So there are people who are doing, who are building tooling to use AI models to expose internal documentation better. But there are also people who are just doing kind of routine, almost like an enhanced spreadsheet type work.

It's a huge breadth of things. It's massive, the scope sideways of it. They come when there is a problem shaped like, I wish this machine was bigger.

Do you have a lot of R users who are in RStudio or Positron ? Do you have a lot of Python users who are in Positron? What does that sort of composition look like for you as far as the people that you're serving?

It varies a bit between teams. I think we're about 50-50 between or maybe slightly under 50-50. Slightly fewer R users than Python users. And the Python users are mostly between Positron and VS Code, gradually going more Positron. We have got some people using Jupyter still. And we try and support just whatever tools work best for people. So we try as hard as we can not to push people in a particular direction, but to make available the things that are right for their use case and to talk to them about why a certain thing is easy or hard. In terms of things deployed to Connect, the majority these days are Python. I think the future for Connect apps seems like it's quite heavy on Python, at least for the more sort of trailblazing type stuff. But we've got plenty of Shiny and plenty of R as well.

Evaluating and introducing new tools

This question says, when people suggest new tools that they want to use, what is the most helpful information that they share? How does that process go? This says, in short, can you help teach us how to get things approved by our own teams?

The way we tend to work, if people come and say, if I take a simple example, like an extension to Positron, which is pretty easy for us to do, the main questions we'd have is, what's the licensing like? What's the support and how actively maintained is it? And is it solving a unique problem that is not solved by something else that's already available? So something like that is quite simple for us to do. The main reason we don't just blanket allow VS Code extensions is we need to make sure we keep them maintained for the foreseeable future so that we don't pull the rug on anybody.

For more significant kind of technologies, say if we were introducing Connect now as new, the main thing is, what does this give that what you have available now does not provide? Can you build an EC2 server and stick your Python app on it? Why isn't that good enough? What's it bringing that what you have available can't provide now?

And the way we work, the amount of evidence increases with how complex that piece of work is. So an example where we didn't implement it was looking at the inference servers for some of the self-hosted AI models. We were experimenting with those early on before we had access to models provided by cloud providers. And playing with those, we did experiment with whether we would benefit from the inference server. There were some costs, there were some complexities. There is business approval and support type stuff. And that's quite a high bar.

So the effort needs to be worth it. And I think the thing where there's maybe a gap sometimes is when people are asking about new things. They're seeing it in a, I can grab this on my laptop at home. I can make it available. And if I get bored of it in six months, I can get rid of it. Where for quite a few larger organizations in particular, they're looking at it and going, as soon as you put that there, we're running it for 10 years. That is going to be used by someone and we're going to have to maintain it.

As soon as you put that there, we're running it for 10 years. That is going to be used by someone and we're going to have to maintain it.

And it's about relationships and empathy, pretty much.

Team performance and capacity

Sometimes it feels like only a few people on a team are stars who really move the needle. As a leader, how do you raise everyone's level so the whole team performs like your top players?

I think something we try and do is we try and do things, particularly when it comes to externally facing, so other stakeholders within the business, we try and act as a team doing these things, and we work in a way that all of us are comfortable with. I don't really like to think of it in terms of people being high or low performers. There's a set of work that needs to happen. People have got some specialisms and preferences, but the team as a unit is trying to get things done. There is a, I feel there's a bit of a risk if you've got too much of the kind of the hero culture type thing that can get you into a bad place. It's a bit of a downer for people who aren't being specifically called out.

I particularly like it when we get praised as a team. Occasionally, we've had feedback from people where they are calling out the team as an entity who are delivering what people need, and I think that's a good thing. You can make sure people are doing the right work that suits them, but I'd also try and think in terms of is the team structured and is it receiving work in the right way that allows everybody the time, the space to succeed, and also the space to be doing things that aren't a strong suit so that they can learn from that. If you're running at 100% capacity all the time, you're in trouble as soon as someone's, you know, they're a bit under the weather or they're worried about something.

If you're at that kind of capacity, it's going to be bad, and maybe to push back a little bit on it, try and have a slower pace of deadlines, or that kind of thing, but that also building trust with the wider organisation really helps with that. So, when we don't get do-it type requests, we get conversations about what's possible, how it might be doable, and we can start by having informal conversations that are a bit that's going to be easy, that's going to be hard. You can have that in a couple of weeks, and this will probably be eight months, and we have these kind of conversations. Sometimes people go, actually, it's not worth that much effort, and you're having honest conversations with stakeholders, and it's not kind of work flowing downhill, it's a collaboration. And some of the collaboration is actually sometimes it's going to take us a bit longer to do things, and it's a lot of trust-building to get to that point.

I think it's trying to avoid the situation is probably the way I would approach that. Well, I don't think I could have asked for a better piece of career advice there, because all of that is such a good idea, but so hard to actually implement, right? Like, stay at 85% capacity, allow yourself that buffer, build coalitions inside of your organisation, build trust, work on things collaboratively. Everything is a good idea until someone has to implement it, and it takes a long time. So, thank you so much for giving us all of your wisdom today.

Managing data platforms for 600 analysts | Tom Grace | Data Science Hangout

Transcript#

Tools, data, and scale at HMRC

Evaluating and introducing new tools

Dynamic compute and budget allocation

Training and keeping up with data science

Managing language diversity and knowledge transfer

Upgrading the platform and release cycles

Staying close to users

Change management and organizational inertia

Team performance and capacity

Featured software#

Positron