Great Data Products

░░░░░░░░░░░░░░░░░░░

A podcast about the craft and ergonomics of data. Brought to you by Source Cooperative. Subscribe ↓

→ Episode 6: The Storm Events Database Explorer


YouTube video thumbnail
Video also available on LinkedIn

Show notes

Jed talks with Kwin Keuter and Brad Andrick, geospatial software engineers at Earth Genome, about the Storm Events Database Explorer. This collaborative project between Earth Genome, The Commons, and the Internet of Water Coalition provides access to over 1.9 million U.S. severe weather events spanning 70+ years of NOAA’s National Center for Environmental Information (NCEI) storm records, including tornadoes, floods, hail, and hurricanes.

The conversation explores how Earth Genome approached transforming decades of federal storm data into an exploration-ready dataset with multiple access modes. Kwin and Brad discuss the data quality challenges they encountered, from changing event types over time (at the outset, the dataset only recorded tornadoes) to inconsistent location data across different event types. They explain their design process, which started with user surveys targeting meteorologists, emergency managers, insurance professionals, and local government planners to understand pain points and workflows.

Throughout the discussion, they emphasize Earth Genome’s philosophy of creating as many modes of access as possible: a visual Explorer interface for non-technical users, downloadable CSVs for traditional workflows, a programmatic API for developers, and cloud-optimized Parquet files on Source Cooperative for data scientists. The conversation touches on broader themes about marketing data products, the economics of sustaining open data tools, and the role of government in producing core datasets while enabling external innovation.

Key takeaways

  1. Multiple access modes serve different users — Earth Genome built a visual Explorer for planners, CSV downloads for traditional workflows, an API for developers, and Parquet files for data scientists, recognizing that different users need different interfaces to the same underlying data.
  2. Historical datasets require careful handling — The Storm Events Database only recorded tornadoes from 1950 to 1996 before expanding to 55 event types. Working with evolving data structures across 75 years requires thoughtful design to present both historical and modern data in semi-standardized ways.
  3. Data quality issues are inevitable — From FIPS codes that don’t match known counties to varying location representations (points for tornadoes, polygons for heat waves), real-world datasets contain inconsistencies that must be addressed through ingestion pipelines and documented for users.
  4. Marketing data products requires ongoing effort — Building the tool is one thing; driving awareness and usage requires conference presentations, blog posts, webinars, and community engagement. The team emphasized that making data easy to use means more than just posting it—you have to actively get it in front of people.
  5. Government should focus on core data collection — Brad and Kwin discussed the value of federal agencies prioritizing primary data collection and publishing over building every possible user interface, allowing external organizations to innovate on top of stable, open datasets.
  6. Feedback loops remain missing — Despite building a valuable tool on top of NOAA data, Earth Genome has limited direct engagement with NCEI. Creating channels for data users to communicate with data stewards would improve data quality and help agencies understand the value their datasets provide.

Transcript

(this is an auto-generated transcript and may contain errors)

Jed Sundwall: It’s 10 AM and we’re live. And thank you so much. All right, Brad and Kwin and everyone out there. Welcome to Great Data Products. This is our live stream webinar podcast thing about the craft and ergonomics of data. It’s brought to you by Source Cooperative. We’ll be talking about Source Cooperative a bit today, which is exciting. One quick bit of housekeeping before we start. Another thing that we do at Radiant Earth is the CNG forum. And just yesterday, we published a blog post about the upcoming event that’s happening in October. Early bird tickets are still available and there are opportunities to sponsor.

submit a talk, do a workshop. Brad and Kwin, believe both spoke at the first one last year, so they can attest it’s a great event. But anyway, I’ll put a link in the chat and in the show notes. It’s a great event. And I, you know, you don’t want to miss it. I think it is becoming very quickly the the conference of choice for data professionals. So today we’re joined by Brad, Andric and Kwin Coyter? I should have figured this out beforehand. Cuter?

Kwin Keuter: It’s [pronounced] Kiter.

Jed Sundwall: Wow, two strikes. We’re going to talk about the Storm Events Database Explorer. Before we get into it, do you mind just introducing yourselves? I don’t care which order you go in.

Kwin Keuter: There’s no other way.

Brad Andrick: Kwin, you want me to? All right. We talked about this ahead of time, that we were just going to shift back and forth, but I’ll run with it. Thanks, Jed, for having us. My name is Brad Andrick. I am a software engineer at Earth Genome, where Kwin and I both work. I’ve been in the game for a little over 12 years. My focus is in digital cartography, geospatial software engineering,

Jed Sundwall: Yeah.

Kwin Keuter: You feel that?

Brad Andrick: been public sector, private sector, now in the nonprofit world. And love it. I can give more of an Earth genome thing in a minute, but I’ll let Kwin give his response.

Kwin Keuter: Yeah. Yeah. Hey, I’m Kwin Keiter, also an engineer at Earth Genome. My title there is Geospatial Software and Data Engineer, which is most verbose title I’ve had, but it is accurate. Basically, I build data pipelines and APIs. And I also dabble in DevOps a bit. So, yeah.

Jed Sundwall: All right. Well, thanks for coming on. This is an exciting one for me because we’d worked a bit. I mean, we actually didn’t work that closely on this project or anything like that, but I remember talking to Mikel about it months ago and sharing some of my thoughts on this project of an explorer, but also exposing the underlying data in source cooperative.

so that people can do other stuff with it. That’s what we’re doing. That’s always sort of the dream for me is to have an interface that’s accessible by a lot of people and really beautiful and elegant, but also making sure that people can get under the hood and get the data. So can you talk a little bit more about the project and the genesis of it and who the audience is and things like that?

Brad Andrick: Sure. So I’ll give the genesis and Kwin can tell me what I get wrong and what I get right. So the Storm Events Database exists as a product of NOAA. And specifically, NCEI over in Asheville, the National Center for Environmental Information, I it is, puts out this Storm Events Database.

And in short, the Storm Events Database captures storm events that occurred going back to 1950 to today. And it includes some unique things inside of it. So those will be an event type. It could be a hurricane. It could be a tornado. Wildfires are in there. There’s also a data narrative, which is pretty interesting. So it’s about the particular event. And then…

that exists and is put out as a CSV sort of product as a lot of federal things kind of end up being at some point. And there is a user interface you can search through the database there for. In addition to that, those fields about the storm event itself, there’s also the impact that it had. So that can be property damage as well as deaths and injuries are also recorded inside of that data set. So that’s what that data set really is.

today if you go to Noah’s site itself. Now that’s a little bit different than the Storm Events Database Explorer. So the Explorer is kind of our designation on top of that and the project itself. So what that ended up being is there were some challenges that come up when you have CSVs that you need to download or those user interfaces where you’ve got a couple of filters and then you’re getting pieces of the data here and there out of it.

But a lot of this data is spatial. so there’s wasn’t really a great visualization to capture that. And in general, we thought that there were some value adds and I should mention the genesis of this project is not Earth genome. like, we didn’t decide, Hey, we should make this. It’s great. it actually, it was a coalition to start with the internet of water coalition and they, had an idea for this project working with, dukes.

Brad Andrick: Nicholas School for the Environment, and then also The Commons, which is another nonprofit that all kind of started working on this project idea and needed some of the technical guns to come in and kind of facilitate actually creating it. And that’s where Earth Genome came into the mix to actually build out the tool. I kind of stopped short of saying what the Explorer part of it got into more, but Kwin, I’ll let you go a bit.

Kwin Keuter: Yeah, yeah, we’ll get to that. But yeah, think that this NOAA data set is really unique in that it brings together all these different types of, speaking, weather events. Brad mentioned a few of them. I think there’s 55 different event types that are tracked. And it’s really more about the sort of narrative elements of these events rather than like the

Jed Sundwall: Yeah, sure.

Kwin Keuter: hyper detailed gridded data that you might see, you know, in like NetCDF format, for example, it’s not that it’s more about what happened when this event, you know, you know, this tornado or hurricane, like either in a very specific place or across a broad region, you know, what happened? So the, the narrative texts like Brad mentioned is really useful, but also like, how do you take

everything from a tropical storm to a wildfire, a hailstorm, and how do you make them all sort of have a similar format? And that’s what NOAA and NCI have been able to do. And I really think that the best parts of that are, yeah, what were the economic impacts and how were people affected, you know, in terms of safety with fatalities and injuries?

I’m not really sure where else you would go for all of that. so, yeah, like Brad said, you know, clearly this data set has a huge amount of potential or it’s just really valuable. But if it’s in a CSV or, oh, you can query it, but you can only see like a thousand events at a time or whatever, like you’re not really going to get all the potential out of it. don’t think, so that’s what we tried to do here.

Jed Sundwall: Yeah, that’s great. No, I mean, it’s a perfect example of what, what’s possible now. You know, it’s just over the past, you know, not that many years, it’s become so easy to see lots of things in the browser and interrogate data that way. And I don’t blame anybody for using CSVs. I love, CSVs are great. but, it’s, it’s really interesting. I’m sure you have this very lived experience now having worked with this data to see.

If this data goes back to, you said the 1950 or amazing. So you could just, you have to imagine sort of the constraints that people were under in terms of how they thought about like what was even possible to share.

Brad Andrick: 1950.

Brad Andrick: And we might get into this. will get into this a bit more, I’m sure, but that’s one of the interesting things about this data set and, like the data discovery portion of this project is in 1950, it only recorded tornadoes. And so, yes. Um, what Kwin was it 1995 Kwin, um, was the transition 96, uh, was the transition over to a bunch more event types. And there’s even more now of 55, like.

Jed Sundwall: really? Okay. Yeah.

Kwin Keuter: Yeah, no you still say, I think.

Brad Andrick: Kwin mentioned. So working with a data set that for so many years was just tornadoes. And then you’ve got all of these other event types that come in, making sure that you can visualize in a semi-standardized way both of those different viewpoints over time. was a really interesting challenge to go through.

Kwin Keuter: Yeah, and I just, oh yeah, to echo the 75 years worth of data, like, it’s just, there’s actually some great sort of storytelling on Noah’s websites about the just incredible journey that the data went through, you know, going from, like, basically typing out documents and, you know, filing them and like, whatever, you know, paper filing archives they had to

you know, whatever was, uh, the database of choice, you know, in 1980, like, then every, and then just the rapid acceleration of different developments in our space. And now, now here we are, like, that’s, I think nothing short of heroic to, to shepherd a dataset through all that time. So, but so we’re, we’re just, yeah, we’re lucky that we get to.

Jed Sundwall: Absolutely.

Kwin Keuter: to be on the tail at this point and to get to explore this data and try to make it come to life. Yeah, that was really fun and that’s what we’re here to talk about.

Jed Sundwall: Yeah, no, it’s really cool. mean, it’s, you know, for me, just cause you know, I have some biases here, you know, I’m very excited to see the data in source and in particular that you all created a parquet file. And I’m going to pull up, I have the link here. I’ll also put it in the chat. But that, you know, you can now.

I mean, that’s kind of a janky URL, I guess, to share in the chat. don’t know if people could go to that. you know, that loads up really quickly over 2 million rows of data that you can sort. mean, for people who don’t know how Parquet works, the sorting experience might be a little bit strange because it streams in data. So the different columns will come in. But one of the first things I did when you put up this

this parquet file was going to look like just sort by number of deaths. It’s like, is the most kind of like morbid, know, intriguing statistic there? And it’s, you know, it’s Katrina, you know, Katrina shows up and there’s a narrative about it. And it’s, it’s fascinating to think that we can just, you know, I mean, I don’t remember anymore the limit that Excel has on something, 60 something thousand rows where when I was in grad school, I’d always be frustrated. Like, well, you you get a,

CSV or some sort of file and you just can’t open it. And now we’re at this point where it’s like, you just send somebody a URL and they can see all this data. it’s, if you think about what, what would it imagine 2 million records in 1950, how they would have thought about that. A 2 million page book, you know, 2 million card, card catalog or something like that.

Kwin Keuter: Yeah, and the fact that the queries that you can do on a Parquet file like this are basically only limited by what are the columns there and then how much SQL do you want to learn. Now you probably don’t have to really learn hardly any SQL at all. You could just have an LLM write it for you. But yeah, and then the queries are really fast.

Yeah, I’m glad that you’re excited about that, Jen. It is just one sort of mode of access that we’ve built into this product. I think that this product is sort of exemplary of a lot of what we do at Earth Genome, where we try to create as many modes of access as possible because…

Naturally, someone’s first impression of the data is probably going to be going to the first URL that you shared, the Storm Events, Internet of Water app. They want to see what it looks like, but then they might really love CSVs. have those. They might want to access an API programmatically. They can do that. Or I’m hoping that they’ll be like you and they’ll be excited to try out these parts.

files.

Jed Sundwall: Yeah. Well, so yeah, let’s, let me ask though, like who are these sort of personas? you, I’m curious to know like what sort of thought goes into creating a product like the Explorer and who it’s for and how it’s going to be put in front of people and who those, yeah, who those people might be.

Brad Andrick: Yeah, so part of the process that we go through for most of our projects and in some ways this process I think will be changing, maybe a bigger discussion on like what that looks like for the industry. we always engage with our users as much as we can. So we started with a user survey. There was some initial work before we actually got handed the project.

to, to think about people who are in the, space already to think about, okay, what would be, what would make this better? And after reviewing that, we also went out to the wider team at the end of the water and said, Hey, we want to talk. I’m going to send out a survey basically to more people. And so we, did, we created a survey with 20 different questions, user backgrounds, their current workflow through the existing storm events database, their pain points and challenges.

preferences for new features, and then any final overarching thoughts. And these coverage, with the people that it got sent out to, had meteorology, climatology wrapped up in there by a few different people, emergency management, kind of disaster preparedness folks. There was insurance in there as well, real estate.

And that’s where some energy policy, think, might have been in the mix as well. So that’s when we kind of think about those users, that’s roughly the mix of people. So it spans scientists to someone in the insurance industry thinking about risk or a local government that’s planning for, this area had this much impact of disaster and this was the cost for it, right? We need to understand that risk plan for finances and all those sorts of things.

Kwin Keuter: Yeah. Brad, did you mention local government planners? I think that was one on our list of like target users. And I think that that represents like a segment that is maybe, this would be especially useful as a product for people who need this type of data, but maybe they don’t have like an army of

Brad Andrick: But… Yeah.

Kwin Keuter: You know, data scientists like an insurance company might have to like to build some detailed model, you know, but if you’re, you know, you, you, a mid-sized city or a small city and you just need to quickly access weather event narrative data, like, I’m hoping that people will hear about this so that they could be like, yeah, now I just, now I just go here and it’s just that easy. so I think that like, especially the people who don’t have.

resources or expertise, you know, I’m hoping it gets to them.

Jed Sundwall: Right. Yeah, I that’s, mean, that kind of persona of somebody who doesn’t, they don’t know how to write a SQL query. They don’t know what SQL is like they, and this is also, it’s a fascinating dataset because it’s the kind of thing where like, it’s the sort of thing that I think a lot of people imagine does exist or should exist, you know? Like, like, yeah, so certainly somebody has a dataset of all the storms, but like to find that and then to be able to like, as you all were saying before,

query in any sort of useful way by geography or interact with it was just too much of a lift for a lot of people. So this explorer, hopefully people will find it. I am curious to know though, this might not be your part of your job description, but do you know of like, there plans to like get it into classrooms or train people on it or make sure people know about it?

Brad Andrick: I don’t know of anything like that. I would love for that to happen. So we did the last step of the project cycle, which right now we’re kind of moving to maintenance mode. we run updates. runs updates on the project to keep the data up to date. But that’s kind of where things live right now. The last step, though, before we kind of moved into that mode was communication in a way, which was mostly

Jed Sundwall: Okay. All right.

Jed Sundwall: Okay.

Brad Andrick: blog post, so the Commons case study. We did a blog post outlining our process. Then there was a webinar run by the Internet of Water Coalition that Kwin and I were on and gave the walk through the tool. 20 or 30 people, I think, were on that at the high point. then finally, there also was a talk at Phosphor G North America this year up in Reston.

that kind of outlined a little bit more of the technical details, but still brought up the project. But that’s kind of where that project term ended right now.

Jed Sundwall: Yeah. Well, that’s, you know, this is interesting. mean, I’m not like, you I don’t want to like put anybody on the spot, but like, that’s the tricky thing with a lot of these things is that we’re at this point where it’s like, it’s kind of cheap to create something like this.

then it’s a living thing, presumably, that needs to be stored in for a long time. So anyway, this is a plea to the funders listening in, like, you should continue to fund these kinds of things to make sure that we can drive usage of them. I’m curious to know what’s NCEIs involved in here other than just kind of being where the data comes from. Are they at the table?

Brad Andrick: Not really, which is interesting. I’ve reflected on that myself. I’m like, that’s interesting. So I know with the Duke School, they had some people that they worked with directly at NOAA get some opinions and consult on like there’s categories. So one of the things that’s different between the original dataset and what you’ll find in this product is a categorization. And so that there was a lot of time.

Jed Sundwall: Yeah. Yeah.

Brad Andrick: that went into figuring out where those breaks should be, how we should group things together. And so that had some NOAA consult back and forth. But as far as the NCEI Storm Events Database and anybody that’s currently staffed on that project, we didn’t really have much of a relationship with them. That said, we’ve been following along what they’re working on. So that includes the data releases. But more than that, there has been an initiative to

redo the user interface that currently exists. And it’s interesting to follow because the last update that I had heard was from a presentation last summer, midsummer, that talked about a September release of their beta. So if you go to their website, you have to dig a little bit. You can find a beta version. Interestingly, it does not have the map.

visualization. It has a filter by map feature now, which is very helpful. But it doesn’t have a visualization and not putting all of the points on a map or anything like that. But that beta version hasn’t been released yet. And not who knows why. It could be funding, right? We don’t know. That’s an assumption. it’s a weird time. But there was a parallel project that after that started, at least our awareness of it started after the

Jed Sundwall: Yeah, well, it’s a weird time to be Noah. Yeah.

Brad Andrick: our project is already underway. And we’ve finished our development work last September. And yeah, have you had to see that beta go? did before this call. was like, let me follow up on that on LinkedIn with the product owner that I saw from the webinar and followed up on LinkedIn just to kind of see. And they were just somewhere giving a presentation about something related to the Storm Events beta that’s still underway.

The deadline has passed, but it’s still moving somewhere. So that’s at least a good sign.

Jed Sundwall: Yeah. Okay. Well, that’s good to know. Yeah. It’s a, no, it’s such a, it’s such an interesting data product that, that Noah has here. And it’s as I was looking at it, you know, thanks to your Explorer and also the parquet file and just, know, it’s fun to be able to go back and look at this, this old data. And then also to understand that like there’s certainly,

there are events, like there’s clearly like bad data in there, I think, you in the sense that like, I’m pretty sure Katrina didn’t, there’s no like sort of like, there’s like direct, sorry, I’m like so morbid. I’m like, I only care about death. It’s like, there’s like the direct deaths and then there’s like indirect, you know, fatalities or whatever. And I think they’re like zero for Katrina, which is like almost certainly not true. But I guess, did you have to deal with this as you were going through?

Certainly fields were added over time, you know, for which there’s just no data for historical events. How much of your work was caught up in that kind of stuff?

Kwin Keuter: a few months of work for me to, mean, alongside other projects that I was working on separate from this, but yeah, I spent quite a bit of time digging into the data, you know, as it’s presented by Noan NCI and trying to make sense of where some of those gremlins might be hiding. Yeah, so I don’t have an answer for the…

you know, why indirect deaths might be undercounted in certain events on the, the, we have also, the dataset has property damage and crop damage. spent a while looking at that because one, the way that, that damage, those damage numbers are presented. It’s not just numerical. It’s not just like a million dollars. It’s like one capital

You know, it’s like you have to parse these text fields and, you know, so I got that working and that’s all fine. But then in about April or May, you know, Mikkel Marin, our boss here at Earth Genome, he, he mentioned the Noah’s billion dollar data set. cause we were wondering like, okay, well that’s got, you know,

Jed Sundwall: wow.

Jed Sundwall: yeah? Yeah.

Kwin Keuter: property damage information, like the sort of its database has, how similar are they? Are the numbers exactly the same? So I spent a little bit of time comparing those numbers and the billion dollar data set, like because it’s, I’m assuming that because it’s been focused on figuring out the overall economic impact of a large scale event, they probably dialed in the methodology for counting that damage.

you know, more precisely than perhaps this Storm Events data set has. So typically the billion dollar data set numbers are higher, probably because they’ve done a more complete sort of evaluation. But then it was interesting because then in May, you know, announced that they were going to retire that billion dollar data set. So that caused some…

Brad Andrick: you

Kwin Keuter: I wouldn’t say alarm on our part because everything looked fine for the continuity of the sort of immense database from now on, but we just had to wonder like, is this going to be next? But so far it’s been kept up to date every month except for during the shutdown.

Jed Sundwall: Right. Yeah, gosh. mean, it’s so, you know, it’s inevitable that you, if you look at any kind of data set like this, you’re going to get into all these like really interesting questions where you’re like, you know, you see a number there and it’s like, this is how much damage there was. And you’re like, well, how do you know? Like, like what model did you use? And to your point, like you could use the whatever model that the economist who runs the billion dollar disaster thing, who, the way, the guy’s name is Adam Smith. He’s an economist at NOAA or used to be.

Brad Andrick: Thank

Jed Sundwall: because of presumably Doge or something like that. Not there anymore. anyway, it’s just funny. He’s an economist named Adam Smith. I don’t know him, but anyway, so he has a model to figure out what counts as a billion dollar disaster. And the point being that whatever number is in those cells is like, there’s a lot of thinking that has to go behind that, that’s never documented. It’s very, very rarely documented.

Kwin Keuter: Yeah.

Jed Sundwall: Anyway, it’s just I don’t I want to make sure that it’s clear that I’m not picking on you all if there are issues with the data. It’s just one of the sorry, but I want to hear from you. But like that’s kind of why I’m like curious to know if like NCI is involved, because I think one of the great things about this explorer is that it should allow for a feedback loop to NCI. But my guess is that there’s no one on the other no one on the other end of the line.

Kwin Keuter: Yeah, I would love to have that feedback loop too. In a previous job, I spent three years as a contractor at the US Geological Survey, working on the national map and delivering those data products. I have found myself on the other end of the line. It was like, how do we talk to users? I would love to hear.

know, the users of these data products, like what feedback they have for us. And maybe that was just not in my scope of responsibilities, but I regard this would love to see more of that feedback loop, like you mentioned. Yeah, one specific sort of data quality issue that

that I wrestled with and I think where we ended up actually works pretty well is the location data. So in the sort of mindset, again, you have 55 different event types. How do you represent where that event happened? Some of that types like a tornado, the database has an actual point location or even a series of points where that tornado happened.

But how would you do that for a heat wave, which is another event type? You can’t really use a point location for a heat wave. So there, would make the location. The location would be represented by the county or forecast zone that experienced that heat wave or whatever the event type was. So right there, you’ve got two different points and a boundary, a polygon.

way that we sort of merge those together is just by taking the centroid of that boundary. But then, so that’s fine. But I also found that the way that those boundaries are assigned isn’t always, there’s some cases where they would say it happened in this state and here’s the FIPS code for the county where it happened.

Kwin Keuter: And we’ve got, we’ve got a new data set of, you know, county FIPS codes. And sometimes the FIPS code doesn’t line up with any known FIPS code in that state. So, so what do you do then? You know, well, there’s also names of, you know, the name of the county. So I have a little bit of logic in the data ingestion pipeline where it’s like, okay, if the FIPS code didn’t match up, we’ll try the name, but.

Despite that, there’s still about 1 % of the events where we just try as we might, we could not figure out exactly which county or forecast zone was this event. All we really know is what state it was in. So that’s something where I’d be like, oh, I’d love to talk to someone at NCI and say, here are the events that I flagged where I don’t have this, you know, this location. Could you try to fix that?

And maybe they can’t, maybe that event was 30 years ago and there’s just no way to know, but it’d be nice to that conversation.

Brad Andrick: you

Jed Sundwall: Yeah, things are lost to time. don’t know if you saw, mean, this is a plug for a talk from the CMG last year where Drew Brunig, it was the last day, but Drew gave this talk about sort of the origin of railroad time and how like for a really long time, I mean, for yeah, a really long time, most of human history, we didn’t have standardized time, the idea of time zones and being set to

you know, Greenwich Mean Time was not a thing. And so you would have to like, if you showed up in another city, you’d have to like find the town clock and be like, okay, this is how they keep the time here. And you know, just what an unlock it was for us to get a standardized way of referring to time, you know, how that enabled commerce and travel and things like that. And we just don’t, you just, you’re just remaking his point for him. He’s like, we need that for place, you know, there is no way to like,

like a common way to refer to a place. There’s a FIPS code, know, their county boundaries and stuff like that, county names, but it all falls, it can fall apart really easily. And so standardizing that is really important.

Kwin Keuter: Yeah. Well, another thing that I took away from Drew’s talk, he said, you know, it wasn’t just about railroad time. was, it was about, you know, well, one of the points he made was it’s not enough to just post data, you know, make the data open and just stop there. You have to actually make it easy to use. And so that’s where, again, it’s like, we had an open data set.

But here we tried to make it really easy for as many people who would care to use the data, just give them as many options as possible to easily access it. So yeah, again, a good plug for the CNG conference in Snowbird next October.

Jed Sundwall: Yeah. I didn’t pay anyone to do this. So thank you, Kwin. Yeah. Actually, yeah, I’m gonna put, I’ll put a link to Drew. So the funny thing about Drew’s talk is that we did, we failed to record it at the conference. So we had him redo it just like as a webinar and recorded it. And so it’s up on YouTube, but yeah. Well, I mean, yeah, I’m curious to get, you your, your take then on like,

your kind of ideal

user or like how you’d like to see this data being used more like in the future if you’ve given thought into, you know, as you discussed before, like the personas for the explorer, but then also for the data and source. Do you have like any imagination of like who might be doing stuff with the parquet file or all the CSV files that are there in source?

Kwin Keuter: Yeah, that’s, I mean, I would love to just, you know, sort of the LinkedIn crowd that wants to, you know, hey, like, I had an idea. And so I spent, you know, two hours on Saturday, digging into this data set that I heard about, you know, one, I would love to just see like, people in our community in the cloud native geospatial community.

just sort of engage with this data. I have a bit of a hard time telling people who their job is not my job. I’m not a researcher. I’m not a insurance underwriter. don’t know. I want to tell them how to, like, don’t do it that way. Use this data set instead. But.

Yeah, so Brad, who’s your ideal user?

Brad Andrick: It’s a great question to think about because I think that’s one of the problems with honestly this sort of project that we engaged is a great idea, right? And then we have some follow on ideas too. We also have other projects that we need to work on, right? So like, how does that get picked back up? And for us, it generally would be like, oh, someone has this idea.

And they cobble together the funding to be able to support that. And that then turns into one of the many projects that might come back up underneath Kwin or I or somebody else. I think the communication part of that too is something that we need to maybe do more with. And ESIP comes to mind. It’s a great community to get out and talking with. They just had their conference, the virtual one. I think the next one’s out in July.

So that’s probably the direction to engage more directly, because those are little one-off events that we can hit, try to spread the word. But we are, like many organizations, bound by funding, resources, other projects, and priorities, and those sorts of things. So figuring out where that follow-through is and how it happens is maybe a good question to think about.

Kwin Keuter: Yeah, well, so I guess is a little anecdote about for whoever out there is listening, how you could use the data. So I live in Colorado and, you know, in January when much of the US, especially the Eastern States were experiencing severe winter weather in Colorado, were, we had like a few inches of snow here in Denver, hardly anything.

daily highs in the 50s and 60s. So I was wondering, know, what are the historical winter weather trends, you know, in Colorado and in the US? So using the parquet file and some queries against that, you can use the data set to find out very quickly. January is the top calendar month for winter weather events in the US.

And that’s in terms of the number events, the fatalities and injuries, and the economic damage. But in Colorado, again, just a different query, it’s actually March and April that have the highest economic impacts in terms of winter weather. And to me, that was no surprise because typically we were patient about seeing

know, large snow storms, you we know they’re not going to come until April or March. And that, you know, as a consequence, that’s apparently when, you know, they wreak the most havoc in terms of, you know, economic damage. So that was an easy, that took me maybe 15 minutes to figure that out. But yeah, but I think that there is a sort of open question that I’m glad we’re talking to you, Jed.

Like we, think all three of us love, we’ve loved the idea of data products. We sort of inherently see and have a visceral feeling of the value of these, these data. But like, it really depends on, you know, someone also getting that feeling. like, how do you, how do you market a data product? and yeah, what’s your, what have you learned from?

Kwin Keuter: from trying to do that.

Jed Sundwall: Sure. Well, I mean, that’s, that is the whole point of this podcast is to sort of explore these questions in public. You know, one is like having a name, you know, and even like a name, like the like national storm events database or something like that, like is, fine. Like Landsat is a great, is one that I always use because it has like a brand name, you know, and, know, my hilarious joke about Landsat and this is true. it like,

I’d been working with Landsat for like at least five years before I realized it stood for land satellite. I was like, oh, land satellite. This is not obvious to me. like, so, but like just, just even thinking of it in terms of a product is really important. So like, yes, you have to give it a name, you know, that makes sense to people. And then it’s good to, we have a comment from Akis on YouTube, asking, he’s like, how do you market data products?

It’s kind of like any other product. You just have to figure out like how to explain its value proposition. Make sure that people know about it. You get awareness. Talk about it a lot, you know? And it’s the kind of thing that like our world of like nonprofit people tend to tend to ignore entirely, which is like it is sales. know, it’s like it’s like very kind of like it’s like you got to like find channels.

you know, this is what like salespeople like to talk about or like channels, like, which is like, how are you going to get your, your message out to a lot of people at once and make sure that they’re the people that you want. And yeah, I think our, our community is again, is why we started the podcast and like try to elevate this conversation is we don’t talk about this kind of stuff enough. There’s another interesting dimension though, in 2026, you know, and has been for the past few years, which is that

that a lot of users of data aren’t gonna be people, know, they’ll be agents and things like that. And so I’m curious, do you all think about that at Earth genome now? Making data AI ready or whatever?

Kwin Keuter: Well, I have another anecdote. the relevant here, the Parquet, I really relied on a project that Chris Holmes is, you know, it’s a Chris Holmes project. I know that he has collaborators on this tool called GeoParquet.io. So when I was trying to figure out how to

How do I write these Parquet files? What’s the best way to do this in January 2026? And the LLM that I was talking to referred me to Chris Holmes’s Geo Parquet IO project. And when I went to look at it, I looked at the GitHub and it’s like, this has only been really live for like a week. This is in January. Yeah.

Jed Sundwall: nice.

Brad Andrick: you

Jed Sundwall: Yeah, it’s like brand new. Yeah.

Kwin Keuter: So somehow Chris Holmes has figured out how to market his stuff to AI so that it can be marketed back to us. So yeah, I thought that was fun. I think he leaned into using Claude’s skills definition. So they’re just markdown files that explain to an agent

Jed Sundwall: That is crazy.

Kwin Keuter: Here’s how you could use this library. maybe that is what got that to work.

Jed Sundwall: Maybe, yeah, okay, I have an email in my inbox I need to get back to on Chris, on, I’m gonna ask him. But yeah, I I’ve been aware of this project. I’m very surprised to hear you mention it because I’ve been aware of this project. There’s another guy named Nism Libovitz, think, I hope I’m pronouncing his name right, who’s also been leading on this. It is like brand new. that’s awesome, that’s crazy. That’s a really interesting anecdote about like how like,

AI or Claude or I guess you’re using Claude that was aware of or was it?

Kwin Keuter: I think I was using, I don’t know, Gemini maybe or something like that.

Jed Sundwall: Okay, interesting. But regardless, that, like right away, the models have been able to figure out other stuff that people have been working on that would be relevant to you. So anyway, it’s just like a model helping sort of matchmake among humans, which is crazy. That’s super cool. Yeah, but yeah, everybody check out geoparkk.io. Brad, I’m gonna put you on the spot. mean, have you been thinking about the rise of agents as data users?

Brad Andrick: Thank

Jed Sundwall: as well.

Brad Andrick: bit, but if I could, I’d go back to the marketing question for a second. So, I think why it matters is really important. So, and I think about why this data set matters from Noah. And if, if you look at it, we’re talking about who’s our persona and that’s important, but also what is the overall impact potential of this data set? And to me, it’s.

Jed Sundwall: Sure, please.

Brad Andrick: it potentially where billions in resilience funding and infrastructure investment must go. Like it can dictate that level of like direction of things. how many people that have that level of influence are looking at this data set or aware of this data set or using a report that maybe is rolled up five levels and there’s no like actual tie back to the data set. And it also goes to a lot of open data gets lost out there. think of.

I spent a year working for a local government. We had a brilliant open data portal. And it wasn’t just an Esri open data portal, although there’s plenty of those out there in local government. And that’s great for local resources. But then how do you search 15,000 local government open data portals to find what you need? There’s that next step of everything else. And then also, the marketing ties in there on who’s

Who’s responsible for the marketing? Cause a lot of those people building the open data stuff, they’re not marketing. They’re not sales. And there might not be any budget at all for that stuff. either you’re going to kind of what we do. We go to a conference and talk about the work. Great. We got it out there. We go on a webinar. We talk about the work. All right, here it is. There you go. But where is that gap and how to, how does that get filled? I don’t have a great answer to that.

Jed Sundwall: Yeah. And so this is a message that this is our message also kind of like to policy makers or people who are writing open data policy or otherwise like, yeah, funding open data programs is that we’ve got to get ourselves out of this idea that like, you just open up the data and it’s fine and good and you’re done. It’s like, well, no, like what are you doing this for? Who are you doing it for? Are you sure you’re reaching them? Do you have any way of knowing whether or not it’s useful?

to your point, Brad, none of that stuff is funded. It’s all, just kind of just throw some CSVs up on the internet and be like, all right, our job is done here. it’s like, nothing else, no other media production operation behaves that way. Where we’re just like, we’re just gonna put stuff out there to see what happens. anyhow. Yeah, it’s

It is a gap, and that’s why we’re working this stuff out, like I said before, in public. I want to point out that Akas, again, said that he was very interested in knowing more about the user stories. I don’t know if there’s any way, or if you all have shared already, or any sort of information about the background of the project.

I shared the Commons blog post, but I don’t know what else would be out there.

Brad Andrick: Yeah, so the Commons blog, that’s a great one because that’s framed as a case study. And that gives a good amount of the background. There’s maybe a little bit more on the discovery and the process one that has some screenshots that talk about some of those categories that we targeted. That’s on the earthgenome.org slash about or slash blog. And then you’ll find the one.

Jed Sundwall: Okay. Okay.

Jed Sundwall: Okay.

Brad Andrick: Um, for the storm events, I think it’s the first one still cause it’s the last reason maybe, um, that has a few more details in it. As far as that is like a, process, uh, I could speak a little bit to that more generally that we sometimes we target and we, do all the work to build a persona and do like the traditional design thing where we have.

A persona mocked up and these are their skill sets and this is their description and these are their pain points and those sorts of things to get one or two of those generalized personas built out. We did not do that specific exercise for this project. There was some more general work when we got on the project and then there was the user survey. And then from that, we landed on a requirements list that went back and forth between our team, the Internet of Water team and the Duke school.

to basically refine that down, say, hey, this is how much budget. This is all the requirements that we’d love to have in there. These are the most important ones ranked by the user survey that we got back from, did an analysis on that, and then kind of decided, OK, this is where the project needs to live.

Jed Sundwall: Got it. Okay.

Fascinating. then, but then once again, it’s like, there’s a project that needs to live. think there’s a lot that word live is doing a lot of heavy lifting. It’s like for this thing to, sorry, this would be a little cheesy. Like to come to life, you know, it does need, it needs to live in people’s minds. People need to be aware of it. I just left another comment on YouTube. It’s like, if people can’t find it, it doesn’t exist. doesn’t, it’s irrelevant if it’s just sitting out there and no one knows about it. Yeah.

Brad Andrick: 100 % and in the when we joined this project, we did that kind of discovery phase one to talk to users, but also we Googled Noah Storm events database and see what popped up. And there was a ArcGIS dashboard, not a story map, but a dashboard kind of interface that existed at some point. Up to a moment in time and then the data was never updated and.

Yeah, that just went away.

Jed Sundwall: Yeah, yeah, here we are. Well, this is the other, think, this, you know, talking about like, Noah providing a better interface to this is interesting. So, you know, it’s possible that this beta comes out and then it just becomes, you know, moves into production and then what you’ve created is made obsolete by it. And that’s fine. That is probably like a perfectly fine outcome. But it also sort of reveals this interesting

I think it’s an interesting challenge, is that like, especially in the context of data products is that you have the data product and our concept of that on source is it’s a collection of objects. And in fact, the story you’ve told about this data product explains sort of why that’s really important to us is that we take an object or what some people might say a file-based approach, which is that

If you did have a bunch of paper records of storms, you know, on a ledger or cards or something like that, that’s kind of timeless, you know, like it’s not locked up in anything. It’s just that there are paper records. Of course, those are expensive to preserve and maintain and stuff like that. So as we move into digital things, you know, for a long time, we’ve been, I would say beset by database people who I love, you know, some of my best friends are database people, but.

But then they’re like, no, this needs to be put into a database for all these obvious benefits. And we’re like, well, yeah, sort of, except it does start to impinge upon accessibility and portability, or it can, depending on what kind of database you use. And we’re just like, no, I want us to keep a file-based approach just because it feels more timeless, right? And you can live anywhere. So we think of products as like a collection of objects. And so in the case of what you publish on source,

it’s a lot of all the CSVs and this parquet file. I would say if NOAA could only do anything, it should just do that and then get out of the way because it allows people like you to build tools like your Explorer much more easily. But very few government agencies can get away with that. They can’t be like, here’s a parquet file, good luck. There’s an executive somewhere that wants to have a dashboard and it’s fine.

Jed Sundwall: All of those dashboards and visualizations are expensive. go ahead, Kwin, sorry.

Kwin Keuter: They have a need to market themselves and market their data. And yeah, if it’s just a CSV file.

Yeah, they are going to want to go a step beyond. But yeah, your conversation with Denise Ross, I think she made that point about it’s really essential that if a data set can only be produced by a government, they need to make the care and feeding of that data set their priority. need to protect.

whatever it takes to keep collecting it because that continuity is more valuable than, you know, a user interface because, you know, then as long as the data is open, you know, anyone else outside of the government, like us, where those people can come along and build and innovate on that open data. So I was glad that you said that because I hadn’t heard anyone articulate that before, but it’s like, yeah, okay.

Brad Andrick: you

Kwin Keuter: Yeah, I think we need to say that more often, you know, rather than just be passive consumers of public data.

Jed Sundwall: Yeah. I mean, I think we should, everyone knows Denise is very good. conversation is like really great. I need to go back and re-listen to it because there’s some just amazing gems in there. And you know, we’ve, we’ve gotten feedback on, on, on that one, but yeah, I think, the, haven’t said it explicitly. I’ve been thinking this whole conversation, been thinking about what she and I talked about the lack of feedback loops, you know, this is why I was asking like, is NCI at the table? Is there anybody there?

Brad Andrick: you

Jed Sundwall: at the table. you know, and Brad, like you said it, you know, that kind of work that sort of like consumer, you know, customer engagement, front, you know, front facing type work is just very, very, very rarely funded. And, and it’s a, it’s a problem because it does. It’s you almost have no choice, but to be a passive consumer, because there’s just like, no, no way to actually like interact with the provider of the data in many cases.

go ahead, Kwin. Were you gonna say something? Okay. Sorry. Okay. Well, we’re reaching the end of the hour. I’m, so I’m going to say this. want people to submit talks for CNG, 2026, but we’re going to be one bit of feedback we got from last year is we had too many sessions that were very good and people just couldn’t choose. So we’re going to have fewer sessions. We’re just going to like be a bit more picky.

Brad Andrick: you

Jed Sundwall: so I encourage you all, Genome or, you know, either of you to submit stuff. You don’t have to say what you’re going to submit about, but, and also anybody listening as well.

Yeah, I’d be curious. Oh, yeah.

Brad Andrick: Gwyn and I have already talked. Yeah, we’ve got ideas, it’s kind of like, which one, since it is, it seems like a more focused effort this time. Which idea do we pull the gun on and submit for?

Jed Sundwall: Okay.

Jed Sundwall: Okay, here’s a clue is like we want end users. We want like impact, you know, this is sort of a problem our community has is that we are very geeky and we’ll geek out on talking about file formats and stuff like that and technical specs, but we want to talk about actual use like end user impact.

Kwin Keuter: Yeah, yeah, and Jed, I appreciate your encouragement and the comments on YouTube. Also, the implicit encouragement there to keep talking about these valuable data sets and not just post it and forget it, but yeah, continue those conversations because that’s how.

Yeah, I think that’s how their value can actually be fully realized. So yeah, that’s one of the goals for this year. Just keep talking about it.

Jed Sundwall: Yeah, yeah. mean, yeah, it’s, I’ve got another email in my inbox about this and Anak has just left another comment, you how would this work and get funded at the open data product level is like, that’s kind of the point is like, you have to think in terms of products to defend these things is to say like the other email I have in my inbox is about like, hey, how do we prioritize which data sets, you know, from the government are at risk and.

this conversation has been going on for like a solid, you well over a year now, ever since, you know, Trump came back into office and things started going a little haywire. People are like, what are our most important data assets? And people kind of don’t know. There’s just like data here and there, but because the data hasn’t ever been conceptualized as a product that has users and you have an understanding of how much it like costs to maintain that product, these conversations are super hard. so, yeah, I mean, so I can say.

My response to Acasir is like, that’s the whole point of referring to data products as products is because you have to fund them discreetly or at least be, think about defending them at an individual product level. eventually the way it should be and the way it is, like, no one has a portfolio of products that they’re stewards of. And they might have to confront a time where it’s like, know what, this product, no one uses it. Nobody cares about it. It’s time to let it go.

Kwin Keuter: Yeah. Well, just to be clear for anyone who wants to know, our plan for the Stormoments database explorer is to sync the data with NOAA’s data every month through, I think at least October this year, probably longer. It’s actually very easy to do that. Did it last, no big deal. And yeah, so this will be a resource.

Jed Sundwall: Yeah. Yeah.

Kwin Keuter: It’s not like we just did this once and we’re stopping. It’s going to continue to be updated as long as Noah keeps updating their data.

Jed Sundwall: Great. Well, actually I failed to ask this before, but like, is the application itself? What kind of application is this? I imagine this is also pretty modern web app that’s very lightweight.

Brad Andrick: Yeah, it’s the React front end. Mapbox is the MapRendering library in there. It’s fast. It works on a mobile app. We’re responsive. It’s pretty lightweight overall. I don’t know if the phosphor detox from this year are recorded or if they’ll be out at some point, but there was one there that went into more detail on some of the fun behind the scenes database work that

Kwin helped with and using PGTileServe and deciding not to use PGTileServe and going back to DOJSON for the data aggregation component of it. I could talk for way, way long here about the decision of a dot grid as a data aggregation type over H3 or something else. So there was, it was a great, I love, it’s like the place I love to live, digital cartography land, but it’s pretty lightweight and.

Jed Sundwall: Okay.

Brad Andrick: kind of this circle back to the agent question that I avoided in a way, was you made a comment that I think is very true, that we’ve had a bit of a sea change where it is a lot easier to get things done faster, at least to that prototype. We talk about this a lot internally right now on what does that design process look like for us? Are we iterating in code now? Because it used to be code’s expensive. You do that last.

Jed Sundwall: yeah, sure.

Brad Andrick: Well, that’s kind of changing in the landscape, at least for the prototype stage. but as far as Noah putting the data out there and just being like, here’s the end point, scripting down those CSVs. I don’t like Kwin did a lot of work to do that, but these days it’s, it’s not too wild to script those and pull them down right now. Like cleaning up portion. That’s where a lot of the thought work.

went into, but like the, the, the action of pulling down all of the data and then throwing it into a database. Like there’s work to get there to get it to work, to work at scale. we’re at 2 million over that now events in the database. So that starts to be like an annoying, it’s not big data, but it’s a big enough to be annoying that you need to consider aggregation methods and all those things and multiple filters being combined and dynamic querying and all of that.

But anyway, I think there is maybe a world going ahead where that sort of interface is, you can spin it up much more quickly, make these tweaks here and there, and much more open and collaborative and easy to access for people away.

Jed Sundwall: Yeah, absolutely. No, and I think this is the great future that we live in. I’m a broken record on this. Like it gets cheaper to do this stuff every day. Like every day it’s cheaper and easier to build a tool like this. And yeah, we live in a future where you can, I can send somebody a URL and they can interrogate 2 million rows of data. Like in a browser, anywhere in the world, like that’s awesome. What to me, you know, like the lesson from that is like,

then we need to be investing in making sure that the data itself is really high quality and keeps being produced, you know, at the core. So that anybody can build on top of it. All right. Well, so any last words before we wrap it up? you want our audience to know about or look at? We’ll share, of course, links to all of your work in the show notes and stuff like that. But anything else?

Brad Andrick: Um, for anybody that does follow our work, um, on earth index, that’s something we’ve been doing with searching embeddings that we’ve got some interesting things coming out that we went through a Google, Jena at Google.org, Jena accelerator. And so that was an interesting learning experience and we have some even features past that coming out later this year. So there’ll be some interesting things there to watch. And then I just will plug again, C and G everybody should go to the it’s.

Jed Sundwall: Yes.

Brad Andrick: I said to Jed before, this is the conference when I’m at places, I tell people, go to this conference. It’s like, if you took all the technical people and hopefully this won’t offend you, Jed, because you said you’re trying to shift this up, but you took all the technical people out of the false 4G world and put them in a room. And it’s just the geospatial nerds that are also talking about the bigger, wider. were some great sessions in the like business kind of side of things last year.

How do we open data sustainably longer term and funding models? So it was just, it was great. I love it. And I’m going to be back this year. Everybody should.

Jed Sundwall: Awesome, I’m so glad to hear that. Yeah, mean, look, you said it better than I could and nobody needs to hear it from me. I agree that our conference is awesome, but listen to Brad. All right, well thank you both.

Tags: