Get stories of change makers and innovators from the data stack ecosystem in your inbox
Meet Kelly:She served the U.S. Air force as a Linguistic Instructor and later took a turn in her career and went on to build data teams across different organisation. Now she works Wellthy scaling their data team and as an Adjunct Professor at Creighton University. You’ll be blown away by her experience because we sure did! In this interview read about how she stumbled into data science, her advice to new data leaders, and how she built not just data teams but the whole data architecture in different organisations she worked with.
Tell us a little bit more about yourself, Kelly.
I started off my career in the Air Force and worked in intelligence and reconnaissance. We collected lots of data that we would have to analyze. I wouldn't say I was doing quantitative data analysis and I certainly wasn't programming but that's kind of where I started to get my taste of data.
I just stumbled into data science and it's funny because I think a lot of people in data have the same story like this, where they sort of stumbled upon it. You don't normally go to college and end up getting a data degree. People come from all different routes, which I think is super interesting. So, I got involved in a research group in grad school where they were applying data science and social networking analysis techniques to problems in international relations. I worked there for a while but got a little bit burnt out working in government and academia.
Then I heard about a tech company called 'Hudl'- a sports technology company. Initially, I was hired to be a part of the data science team. But after starting there it became very apparent that the company needed internal analytics help. They had no team doing this to help people make better decisions with data.
So I actually switched roles there and was the first hire of this internal team. We grabbed a couple of software engineers and we started putting together a data stack. I was there for almost five years and grew it to a team of 15 people. I learned so much in that time, especially a lot about building a data stack and processes that scale well. I then joined an ed-tech company called Aula. I was their first date hire there and grew it to a team of five by the time I left. Aula sold off their product and I switched roles and started here at Wellthy as Director of Data Science and Analytics. We are a care concierge service that combines technology with a human element to make caregiving easier. Healthcare, especially in the United States is very difficult to navigate and caregiving responsibilities can be quite burdensome and unfortunately disproportionately affect minority groups and especially women. So we are here to alleviate some of the burdens of caregiving responsibilities by handling all the logistical and administrative tasks.
There is an evident gap between what academia has to offer versus what the data industry needs right now. What are your thoughts on that?
I am an Adjunct Faculty at Creighton University in Omaha, Nebraska and what I have experienced is that unfortunately, academia can be a little bit slow to adapt to the latest trends happening in the industry. I always tell my students that they're never going to get a clean data set like this. This is not how it really works. You can’t go straight to building machine learning models. So I always try to give them a little dose of reality.
The ultimate goal is to make a better business decision to either save money or generate growth or more revenue and that's the flashy part of it. This is what Universities started focusing on without realizing that there's so much work to do beforehand to even get to that point. But this is not just the problem with academia, it is the reality of industry as well. Stakeholders think that if they have hired a Data Scientist, they should be able to do magic with data and produce amazing insights but it doesn’t work that way.
What does the data stack look like at Wellthy look like?
Our data stack consists of Stitch, Snowflake, Mode, dbt, and we're about to implement Snowplow. So our main source of data is from the Wellthy platform, the backend database of the software itself. We're also pulling data in from SaaS tools such as Salesforce, Ask Nicely, ADP, Zendesk, Intercom, and HubSpot.
Stitch is a really good tool because it’s very easy to set up and maintain. We're using Snowflake for all of our data storage, and I love Snowflake. I've used Redshift and Postgres and a number of different data warehouses and databases but Snowflake is by far my favorite.
We're using dbt for all of our transformations and it has become a de-facto tool in the modern data stack. It was an immediate game-changer when I first started using it back at Hudl. Then we use Mode for our BI tool. And then Snowplow which we're about to implement. So this is Wellthy’s data stack.
I have a couple of questions regarding your stack. First, have you ever got a chance to explore Reverse ETL tools?
We are still very early on in our data journey but we have plans to use Reverse ETL tools. We're not doing it today but definitely in the future. You're helping all these people make better decisions with data, but if you don't put the data where they're working that creates too much friction. If a sales manager's in Salesforce every day, you want them to have the data that they need to make a better decision, right there in Salesforce and the same goes for other teams as well. This will help in reducing a lot of friction.
Second, are you using Snowplow as an even tracking tool or are you also using it as your CDP?
I would say both. I wouldn't say we currently have a true CDP setup because we really see the data warehouse as kind of the source of truth for analytics. So let's say we were doing event tracking and a person views a particular page and we send them an email to encourage them to finish the process with Wellthy. But at the same time, you're missing this, whole other part of the user journey, because they could also have been in the Wellthy community. They could have been on our public page. They could have attended a webinar and you're not seeing any of that if you are only looking at one data source.
You need to create a unified view of customers and understand everything you can learn about them and their behavior to give them the best experience. So when you have all these systems that aren't connected and aren't taking a unified approach you could end up missing a huge part of the customer journey and experience. You don’t know how to provide value to the customer, so I prefer the holistic approach.
By first pulling on all the data to our warehouse and Snowplow sending it directly to Snowflake we can create that unified customer view.
How do you ensure the quality of the data flowing across the systems?
Automated testing is a big thing and very helpful. We have a lot of tests built out in dbt, and some of these are source data tests. So as an example we have a lot of different tasks and types of projects that can get created on Wellthy and sometimes new types get created, that we didn't necessarily know about, and that might affect the way a metric is calculated. And so we have source tests that say, here is the list of possible tasks that should come through. And if something different comes through, it alerts us.
We also try to add logical tests and manually test and review work.
Quality is crucial because people have to trust the data. If they don't trust it, they won't use it. It's really hard to gain trust and very easy to lose it. So you put out a couple of reports with some bad numbers and stakeholders are going to be skeptical every time you send them something after that. It's really important to have a pretty high bar for quality and make sure you're not putting out junk.
There has been an explosion of data tools in the modern data stack. What are the key areas in the Modern Data Stack ecosystem that you feel are still unsolved and where should the smartest people in the world focus their attention?
I think one of the areas where I would like to see more progress is the ability to quickly and easily put machine learning or other analytics models into production. How do they get updated with new data? How do you handle drift? How do you measure and monitor performance? How do you watch for bias?
Building a machine learning model isn’t very difficult. It is building a good model and being able to repeat it and make the results available for decisions makers. That is what is tricky.
Amazon pushed out SageMaker a few years ago. I thought at the time maybe that was the solution, but my team and I quickly realized it was only solving a small part of the problem, and it still didn’t feel easy to use. Recently, I’ve been looking at Continual. They are trying to solve this problem, and I think they're doing some really interesting things around that and definitely something to keep an eye out for.
You also mentioned an explosion of data tools. There really are a lot of new tools especially in data quality, data testing, and data observability. Then you also have data catalog tools, and they all kind of overlap in different ways.
At Wellthy, we use Guru for internal documentation, and I have all these cards in a collection called tools and technologies. Anytime I see a new tool that I think looks interesting and is solving a problem that we might have, I add it to a card with some links and notes. Not only do I have a ton of cards now, but I keep having to add new categories of cards, and sometimes it feels like tools span multiple categories. It can be hard to wrap your head around and figure out how you might piece everything together to solve your needs. So anyway I would expect to see some consolidation in tooling in the modern data stack, because I doubt I’m the only person experiencing this challenge.
What would be your advice to companies who are starting out in terms of building data teams and new data leaders?
Firstly, don't build everything yourself. It's so easy to look at something and say- “oh, I can just go build a custom job to load this data and extract it or do this transformation or whatever”. It will quickly become your full-time job. There are so many great tools now that you don't need to do everything yourself. It doesn’t scale well with more data and more people.
Another piece of advice is to start small. When you say I can help customer support, marketing, or sales and you're doing a little bit of everything it becomes really difficult to make it all work and make a big impact. In my experience, I think it's better to start small. Start by helping one team and iterate on your process and tools till it is working well and you are providing value. Then expand and apply what you learned with other teams. Don’t try to do everything at once, because you will quickly get overwhelmed and won’t be producing value for anyone.
Rapid Fire
What is your favorite tool in the whole entire data engineering space?
What is a go-to place to learn about all things, data? Like what, any blog or newsletter publication that you read recently?
My go-to places kind of become the Locally Optimistic slack community. I recommend it to everybody. I've learned so much from that community and met so many awesome people.
One thing that you like and hate about your job?
I would say, I love the variety. There's always something new, always a different problem to solve. I don’t know if I would say I consistently hate something about my job, but handling ad hoc requests can sometimes be frustrating especially when stakeholders don’t fully understand what goes into pulling that one metric.