Coding vs. thinking programmatically | Samia Baig | Data Science Hangout

I think it's really trying to understand what their needs are and what works for them by just putting yourself in their shoes or seeing, like, you know, what processes they're used to or how they have throughout time really, like, been in taking data insights and then being able to develop solutions based on that, I think is probably, I would say, one of the most important skills I've learned, you know, throughout.

Domain knowledge and cross-team visibility

My question is, and I guess it's not, I'll start with, you know, trying to understand exactly what subject matter you reside in. Is it pharmacovigilance with epidemiology or kind of where you sit in the organization? But then beyond that, my question is really, do you, is there any kind of community among data scientists across the whole company, right? Spanning potentially like discovery, CMC, non-clinical, clinical development, pharmacovigilance, you know, just to what extent are you connected with other data scientists across the organization? And to what extent do you kind of have visibility into who's doing what? So, you know, because ideally I think a lot of data science, you know, in different contexts, there's, you can learn from each other, even if you're working on different subject matter expertise.

Yeah. Yeah, I definitely agree with that. So I work in the data science and digital health team and we work in research and development. So I work more on solutions for, you know, different clinical trial stakeholders. But then like, you know, I also, you know, will be down the line involved in projects, which will kind of in the same lane, but just like different aspects of, you know, that work. So I'm still learning and, you know, maybe like my answer to that question is also that, I mean, you know, I just don't know, like completely all of like, you know, the different aspects of what's there, because I will also be getting involved in those different aspects.

But I think one thing I have learned in, you know, working in pharma, it's been two years now since I've been at my current role, is that particularly J&J is a very big organization. There's a lot of data analytics in many different aspects that I can't really speak to and don't have full visibility, but I'm trying to learn because I mean, sometimes I'll just see someone's like title or something pop up and I'll be like, hey, can we schedule a one on one, I'd love to learn more about what you're doing. And I think for me, that's really helpful to, you know, just have more visibility, particularly as someone who started here, like, you know, just a few years ago.

But yeah, but I mean, even when I worked in the Department of Health, like even though New York City Department of Health, it's just one city's department, there's like 6000 plus employees. And then, you know, we're just one agency, in like a broader scheme of other agencies. And the things I worked on were like, I was a little siloed, you know, from like other teams, and then it was sometimes just other teams who are just in a neighboring bureau or team who are doing very similar things. And I didn't always have full visibility on that.

But I think the thing is that like, and hopefully I'm answering this question correctly. But like, yeah, I mean, I think I do agree with, like, because sometimes I believe that it's very collaborative, like sometimes what happens in silos is that like, there are people doing some similar things, and then some different things, it's always really good to know, like, you know, all the different aspects of data that, you know, even I don't like fully know, just because I've never been exposed to someone, you know, working on a specific aspect.

In the Department of Health, how I actually got really involved in knowing what other people are doing was, I worked during the COVID-19 pandemic. And so in that situation, emergency situations, we got kind of regrouped into other data, other data teams, or other like, you know, groups, and focusing more on responding to the pandemic, rather than doing kind of our daily activities. So things like that, I think, helped a lot. And, you know, like, how I'm like, bringing that back in my current role is just really like, you know, you know, every time we have these like periodic meetings, or even if we don't just like kind of seeing, if I just kind of see someone like, you know, you know, like working on a specific area, like commercial or things like that, like, I'll just reach out to them.

But it's a little bit more up to me now to like, you know, make that awareness. And I think maybe like, from an organizational perspective, it's, I mean, I think I think sometimes like it, there are initiatives like, you know, to kind of do that. But I think so in large organizations, I noticed that it's easier to get a little bit siloed, I think, just because of the volume of people working on different things.

Absolutely. Even I mean, I work at Posit, which is a much, much smaller company, much, much smaller company than Johnson & Johnson. And it's really up to me to build my community as well. Just because we're all we're all doing such disparate things. Gonzalo had said in the chat, like making a community is not a spectator sport, you've got to take action. And I agree. I think it's really rare for for companies to actually prioritize and have someone organize an internal community of practice. But nothing is stopping you from reaching out to a bunch of different people, including end users of your data products, and just getting to know them and shadowing them finding out how they use what they use, what their pain points are, stuff like that.

Biggest data challenges

I mean, you work in all these areas, you're in J&J, a very big organization. You know, you've got data that you work with all the time, and then data sort of several steps removed from, you know, the things you deal with. So if you could speak to a little bit, what are the biggest data challenges you have? Is it internal versus external type data? And if you had a magic wand, what problem would you solve in that area?

That's a great question. Yeah, I think maybe I might have alluded to like, or maybe someone else might have asked a little bit in terms of like, you know, the pipeline robustness question. I think the thing is that sometimes like the issues with, and not just specific to any company, but just overall is that, you know, an idea comes, we want to do it quickly. Let's just do what we can. Like, it's almost like, you know, I think we're just kind of like, we're not, you know, focusing on the more upstream processes, but then just trying to leverage what we can downstream to get something out. And I think that it's kind of like sometimes what happens in data, I think that these can be a little bit like band-aid solutions, like we're just trying to get things out, but then we're not like, you know, seeing like, you know, if we fix that upstream problem that needs to be fixed, then this can, you know, significantly reduce the time I spend on three different projects to, you know, make that change within three different projects.

Or like, you know, if I just kind of like, you know, think more, and it kind of just, I would say, goes back to like that mindset of, you know, developing or really developing our pipelines in a way which we can really just minimize all of the fixes happening way downstream. Because then, like, you will just keep on making these individualized solutions downstream when some of those solutions could definitely be, like, you know, fixed upstream, and then it could scale to multiple projects, and we wouldn't, in the long run, be spending so much time doing that.

And I would say maybe, like, that's a kind of thing, like, you know, that's something I would love to maybe fix. I would say if I had a magic wand, if a magic wand can write documentation for me, that'd be amazing, because documentation takes a lot of time. Like, it's like one of those things which are kind of dry, but you know that, like, it's very important. And sometimes I enjoy documenting because it's almost like a teach-back to me, like, oh, this is all I've done, and this is what it is. But it can be a very time-consuming. So, I think that if I had a magic wand to just document all the thoughts that in my head the exact way I wanted it to, you know, just write that documentation, I would say I wish there was an easy way to do that. I know some people are using AI for that, but sometimes still you need to contextualize, like, exactly, like, the thing. So, I think with AI, like, maybe, you know, it would still be like a hybrid process and trying to, you know, create documentation.

Career advice: join a community

With our last five minutes, I would like to ask you a career advice question. Samia, what is a piece of career advice that maybe you wish you could go back in time and give yourself, like, a younger version of you when you were just getting started in data or maybe leaving school or something that you like to give to others?

Sure. In terms of career advice, so, I think for me, the biggest thing was just, like, joining a community. I mean, I'm not, like, saying this just because I'm in, like, you know, a data science hangout. Join this one. Being for real, though, because, like, I joined, like, I would say, like, a lot of my learning happened probably even outside of my work and, you know, in the community. Like, I joined Tidy Tuesday. That advice was actually given to me by someone when I worked in my first job because we did have a small but mighty R user group at my workplace, and I think just joining more groups, like, there's a really great, and I mean, I think everyone's already kind of there, but just, like, even, like, in person, if your city has one, those were really amazing. I met Rachel at the R user group when I was in Boston many years ago.

So, it's, like, I just really, like, you know, learned people, learned different things. I was a little, you know, intimidated, and sometimes, you know, I mean, I still am even all these years just, like, going to groups and being, like, oh, these people know so much and, like, you know, what do I, like, ask or what do I do, but I think the community, like, you know, I do notice that a lot of people very much are on the same page, no matter what industries we work in or, you know, whether we're starting out or years into this, like, more or less everyone's on the same page.

But I do feel, like, you know, like, joining these communities and particularly not just joining, like, you know, something more that's, like, a talk session, but, like, I think Tidy Tuesday, when I was doing that every Tuesday and then eventually I did, like, the 30-day map challenge, it was a very excellent way, like, really to just learn from other people. Like, I started feeling, like, I'm okay at ggplot and now feeling, like, really, really, like, excited and good, you know, about, like, the work I've done just because, like, you know, I, you know, just, like, other people shared their ideas and then, like, you know, I also started to, like, it kind of, you know, helped me think about, like, you know, my own unique perspectives and then really, like, you know, think creatively through R code and things like that.

And I think, you know, even though that was more specific to dataviz, but, like, it also just kind of helped, you know, me get better at writing better code, writing some R packages in my own time and really, like, you know, also just, like, re-emphasizing the importance of version control. So, I do highly recommend for anyone who is, like, you know, interested in, you know, open source languages to really try to join some of these, like, more practical, you know, like, projects or, like, social community-based projects because I, it really helped me get from a point where I was, like, okay, I feel somewhat okay, but, you know, I could do more to, like, more confident, and I think that definitely helps, like, you know, bridge some of that, bridge those skills.

I do highly recommend for anyone who is, like, you know, interested in, you know, open source languages to really try to join some of these, like, more practical, you know, like, projects or, like, social community-based projects because I, it really helped me get from a point where I was, like, okay, I feel somewhat okay, but, you know, I could do more to, like, more confident.

That's fantastic, and you know, I 100% agree with that and want everybody to get involved in the community and share your work out loud. Thank you so much, Samia, and this is where I will plug, again, the Data Science Lab because we have so many things that have to do with what Samia just said coming up. March 10th, we're going to have John Harmon, who is the maintainer of the TidyTuesday GitHub repo. He's going to show us how to submit a TidyTuesday dataset, how to curate a TidyTuesday dataset, what happens behind the scenes of how, you know, when they get submitted and curated and how they end up on the TidyTuesday repo. And then the next week, we're going to have Joey Marshall, who's going to be live coding analysis of a TidyTuesday dataset, I think, using Cloud Code, and this is the one where you'll see him use voice transcription.

So, if you're just curious about how other people work, you want to see some live coding, come join us at the Data Science Lab. It's where we share screens.

Coding vs. thinking programmatically | Samia Baig | Data Science Hangout

Transcript#

Analytics engineering vs. data science

Background: from pharmacy to data

Transition from public health to pharma

Data scientist vs. analytics engineer responsibilities

Tools and languages

Making data pipelines more robust

Thinking programmatically vs. just coding

Non-data science skills

Domain knowledge and cross-team visibility

Biggest data challenges

Career advice: join a community

Featured software#

dbplyr

tidyverse

tidyverse.org