Great Data Products

░░░░░░░░░░░░░░░░░░░

A podcast about the ergonomics and craft of data. Brought to you by Source Cooperative. Subscribe ↓

Standards

→ Episode 4: How Standards Emerge: Lessons from STAC


YouTube video thumbnail
Video also available on LinkedIn

Show notes

Jed talks with Matt Hanson from Element 84 about the SpatioTemporal Asset Catalog (STAC) specification and its role in making geospatial data findable and usable. Matt describes STAC as “a simple, developer-friendly way to describe geospatial data so that people can actually find it and use it.” The conversation covers how STAC emerged from a 2017 sprint in Boulder with 20 people and grew into a specification now adopted by NASA, USGS, and commercial satellite companies worldwide.

Matt promotes Howard Butler’s concept of “guerrilla standards” – a grassroots approach where stakeholders build something that serves everyone’s needs rather than making bespoke solutions. The central thesis: adoption is the only metric that matters. You can have the most elegant standard, but if nobody uses it, it’s not a success. STAC succeeded through community collaboration, simplicity of the core spec, an ecosystem of open source tooling, and timing—arriving just as cloud storage matured and satellite data exploded.

The conversation ranges into the limitations of remote sensing (“Remote sensing sucks,” Matt says, pointing to 20-30% error rates in land cover products), the future of purpose-built satellites, and why new data institutions are needed to validate emerging data products. Matt and Jed also discuss the credibility problem: launching a successful standard requires champions who have earned trust in the community. As Matt notes, “You have to earn credibility” – there’s no shortcut to building the relationships that make standards adoption possible.

Takeaways

  1. Adoption is the only metric that matters — An elegant standard nobody uses isn’t a success. A “crappy” standard everyone adopts improves lives and enables interoperability.
  2. Guerrilla standards work through buy-in — When people are part of the process, their needs get addressed and they become champions who use the standard internally.
  3. Simplicity drives adoption — STAC focused on meeting 80% of needs with a simple core spec rather than trying to cover every possibility.
  4. Timing matters — STAC arrived when cloud storage matured, COGs gained traction, and satellite companies were launching rapidly. The previous methods weren’t working.
  5. Credibility can’t be skipped — Standards efforts need champions with established reputations. Chris Holmes’s involvement and relationships were essential to STAC’s early traction.
  6. Remote sensing has real limitations — 20-30% disagreement between land cover products is common. The value of remote sensing is in relative differences and time series, not absolute measurements.

Transcript

(this is an auto-generated transcript and may contain errors)

Jed Sundwall: Welcome to Great Data Products. This is a live stream webinar podcast thing from Source Cooperative where we talk to data practitioners about their craft. We do this every month and you can visit us at greatdataproducts.com to see previous episodes and find links to subscribe on YouTube or wherever you get your podcasts. If you follow Source Cooperative on LinkedIn, we notify people about it there also. And then we also have a Luma calendar where

You can see that the next episode on great data products.com, but we actually have episodes scheduled out in January and February that you can see on Luma. I’ll talk about that in a minute. But today we’re joined by Matt Hanson from Elimin84 and a good old friend, I would say. And we’re going to talk about the spatial temporal asset catalogs specification. Matt, do you want to introduce yourself?

Matt Hanson: Yeah, thanks Jed. Really happy to be here. Thanks for inviting me. I’m Matt Hansen. I work at element 84 and I have been, I’ll give a brief background. I’ve been working in the remote sensing field for geez, close to 30 years now. And got into open source about 15 years ago and was in, I went to phosphor G and was instantly like this, this is, this is it. This is what I want to do.

I started contributing to GeoNode, was my first open source project that I contributed to. And then I started working on other projects and eventually got into stack and standards.

Jed Sundwall: All right.

Jed Sundwall: Nice. well, I can say we’ve been lucky to have you in the community for a long time. so, and, yeah, I mean, we’ve, and we’ve got a lot to talk about. You’ve done a lot, you’ve accomplished a lot. And, I would say your involvement in stack is a really secured your legacy. mean, among others, it’s a community effort, which is, you know, partially what we’re going to talk about here. So, you recently, boy, like let me, let’s actually back way up.

And can you, how do you describe stack to people? And with, with the caveat that like this podcast is not necessarily a geospatial podcast. we, we do want to reach more people, who don’t necessarily have expertise in geospatial. So how do you describe stack at a very high level?

Matt Hanson: Yeah, so Stack is, well, I describe Stack as a family of specifications as well as an open source ecosystem. And that’s maybe not a really layman’s way to describe it. So let’s talk, let’s say that it’s a simple developer friendly way to describe geospatial data so that people can actually find it and use it. That’s quick one sentence version.

Jed Sundwall: Okay. And then, okay, okay. And okay, so I’m going to play lay person. And I actually don’t even have to like pretend that much. Like I’m actually this naive in a lot of ways. I’ve heard Mark Corver, another esteemed colleague in this world, describe stack as solving the problem of listing objects in S3.

Of course, I can’t help but be very nerdy here, but like part of the problem that we’re facing and it’s not just in the geospatial community, it’s in many other domains is that we’re dealing with so much data that even just listing the files that you have is expensive. Like it takes time. And so you can imagine having a corpus of millions and millions of satellite images and you have to, you know, go through that haystack to find stuff. One way to characterize stack is that it helps it make it

Matt Hanson: Mm-hmm.

Jed Sundwall: Basically easier to index all that stuff to find what you want. Is that fair to say?

Matt Hanson: Yeah, I think that’s definitely fair to say. the tying it to S3, it’s not necessarily required, right? Like, stack could describe data files wherever. It doesn’t have to be in object storage. But no, I think that’s a good way to talk about it. When I give a stack presentation for new folks, like a stack 101, I often will talk about

about exactly this issue of the explosion of geospatial data. Like there’s been so much data and if you look at just like NASA’s holdings and their projected holdings over the next five years, we see so much data that if you don’t index the data, I had this saying that if your data is not indexed, it might as well not exist. Because if nobody can find the data and it’s just like, as you say, you’re just getting a listing of all the files, how can you actually find

the data that you want if there’s a billion files in object storage, let’s say. And that’s not far-fetched. That number is not all that far-fetched. Yeah. No, as I saying, if we look at Sentinel-2, right? If you look at the entire Sentinel-2 archive, there’s 25 million files. There’s 20 files for each. There’s 25 million scenes. And for each scene, there’s 20 files. So it starts adding up really

Jed Sundwall: No, yeah, not at all. go ahead.

Jed Sundwall: Okay. And then when you say easy, easy for who? like what, you know, stack stores its data in JSON. So who, who’s like the, the, typical user of stack, like what kind of software do they use? Like what kind of job title do they usually have? Yeah.

Matt Hanson: Yeah, geez, that’s a good question. I think that the ultimate data user is probably a data scientist. Like that’s, and that’s, think that’s who the original target was. When we first started looking at this, were primarily looking at public data sets because that’s what is available. And, you know, we, that’s what we were looking to index was NAEP and Landsat and Sentinel-2. And it was really a

data science user problem. And that was my background. That was where I come from, was working with scientists and working with different types of data and having to use different formats and different tooling just in order to find and access the data. And so I think that really was the primary user. We talk about it being developer friendly because of the open source ecosystem.

but like, and that’s really developers working in tandem with data scientists in order to leverage and use the data.

Jed Sundwall: Great. Yeah. mean, so I’m, I’m, leading you here a little bit in the, the, to the point being that like, think, I’ve, know, I’ve, I’ve worked in the open data space for my entire career, basically at this point. And so many conversations have revolved around like making data easy for anyone or something like that. And I argue that that hasn’t worked out super well. You actually need to find like, who are the actual practitioners that are going use the data and like, what, what will they be comfortable with?

or like what will actually help them rather than having a kind of nebulous like everyone thing. Yeah.

Matt Hanson: Yeah, yeah, it’s clearly not everyone, yeah, mean, we have had like journalists, like we’ve people have reached out to us from like New York times and like they’re creating stories and they want to access geospatial data. And so they’ve used some of the tooling around that. So that’s as close to a lay person. I think that, you know, we’ve really worked with.

journalists who want to tell a story and they just want to find data. They just want data from five years ago and today to look at a change over time and use it to write a story about it. And they were able to use the tooling, like PyStack client, even before that there was SatSearch, was an earlier tool set, and they were able to figure that out. But they were still leveraging developers to

to do that.

Jed Sundwall: Right. Well, but I think then there’s another clue here, which is that you have, we’ll go on this with journalists. You have an audience that typically has not been able to engage with imagery or like, you know, geospatial data. but they are, you know, we’ve watched this happen throughout our lives, like becoming more savvy and, more aware of the need to be able to like use software and data to tell stories and things like that. but they’re coming to us from like,

a completely different place than I think most geospatial data practitioners were in previously. so the key there, mean, you you, mentioned PyStack, you know, like they’re for whatever reason, you know, a lot of journalists use Python, you know, there are different communities that use different tools. Yeah.

Matt Hanson: Yeah, right. Sure, yeah. Language of data science, yeah.

Jed Sundwall: Yeah. Okay. We actually already have a question on, on YouTube from, who I’m just going to, I’m just going to call Sig. I’m not sure if that’s his name, his or her name. can’t tell. But asking stack is built around sharing data easily to anyone. Let’s say you want to use to share more secret data with access control, SSO encryption, et cetera. And different users that have different access to different data sets. have some thoughts on this, but like, as you mentioned, it stack doesn’t have to be explicitly tied to a

a cloud object store or a public bucket. Do you want to take that? I imagine you have some actual examples here. Yeah.

Matt Hanson: Yeah, so this question comes up a lot, right? Because out of the box, so I will get a little bit more technical here. what we use, so we have an API called EarthSearch that indexes public data sets on AWS. And that’s an implementation of Stack API. And for example, that one, that implementation has no authentication in it, because we were using it originally to index public data. And so.

we didn’t have need for controlling access and all the data was public and so hadn’t added that. And so we get that question a lot. And stack fast API is another implementation that didn’t have like really core built-in authentication at prime that it was first created. So there’s a couple of ways to do this. I’ll jump to the end first, which is that there’s a more

modern solution for this, it’s called Stack Auth Proxy that DevSeed has created. And that can be used to control access to individual items and collections based on attributes in the data. So that works pretty well. But what we’ve generally done is use it as a proxy. So you have your catalog and that’s open. Or it’s behind a firewall, but it’s like available to anyone who can access it.

Jed Sundwall: Okay.

Jed Sundwall: Interesting.

Matt Hanson: And then we have a proxy in front of that. That handles the authentication, queries the catalog, it knows what people can see, and then returns that result. So it’s going through the proxy. But these tend to be all one-off solutions that are created. so I think Stack Auth Proxy, if you haven’t seen that, that’s definitely something to look at that you can combine with Stack

with StackFast API or any potentially any stack API implementation.

Jed Sundwall: Okay. So yeah, I mean, I think, one thing I’ll like underscore here also is that like stack is a metadata spec. It doesn’t, it itself doesn’t say anything about authentication or anything like that. Like, so it’s, it’s, it’s been built to be very flexible, useful in all sorts of environments and extensible. I want to just stay in the weeds of stack a little bit longer. so the, the specification

Matt Hanson: That’s right.

Jed Sundwall: is made up of other specifications. So you have the idea of a, I’m going to go in order of like collection catalog and item. Can you walk through each of those and like what they encompass? Sure.

Matt Hanson: Yeah, sure thing. well, so we start up at the top. That’s a catalog. A catalog is really just a container. It’s a JSON. It contains really simple fields. Like you have a name, you got a title, you have a description. And then you have, most importantly, all of these entities within stack have links. And links are probably the most important part of stack, right? Because

Jed Sundwall: Yeah. okay.

Matt Hanson: we, when we got into this at the beginning, the ability to crawl a catalog was really important because that’s the way the internet works, right? Is by crawling things. And so, we wanted to be able to link, a whole catalog together and link down to items and link back up so that you could really visit any part, of data in this catalog and be able to crawl it one in both ways.

So catalog is the starting point in an API, especially the catalog is, that’s your landing page and it’s going to contain links and it will contain links to the collections underneath it. And each collection is really looks a lot like a catalog, a collection at one point, it even was a catalog, it was derived from a catalog.

Technically, that’s actually not the case anymore. It’s its own entity, but it looks a lot like a catalog. But collections are ways to group together items and data that is similar to each other. And so the most obvious case is when we look at the big public data sets, we see Sentinel-2 or Landsat and like Sentinel-2 level two data, that is a collection.

Right, it contains a bunch of items and that’s your next level down is an item. And an item is this where this is where we move from JSON to geo JSON because an item actually represents a specific location and a specific time or range of times. And that’s really where your data is. So you can think of it as a scene. You can think of it as it’s a footprint containing data. The data is contained.

in what are called assets. So that’s really the fourth entity type, except assets are actually embedded directly in the GeoJSON of items. So you have the catalog, collections, and then items. And so that’s the general hierarchy. And we have links that allow you to go all the way down from catalogs to items. Now, there is some nuances between a static catalog

Jed Sundwall: Right. Okay.

Matt Hanson: what we call a static catalog, which is really just a bunch of linked JSON files on disk or on blob in an object store. And that’s an important distinction between that and a dynamic catalog or what we call an API. And so there’s nuances because you can have, for instance, can have sub catalogs within a static catalog.

If that’s confusing or it might be a little confusing or not, but it’s a way to like partition the data basically. So you can, you can use sub catalogs to organize it. So you might have a collection and then underneath that look, we’ll have like a catalog for each continent. then you go onto the continent and then that’s, that’s where your items are. So it’s just a way to partition and organize the data in an API. This question comes up a lot, which is why I have the whole.

narrative around it here, but like in an API, you don’t need those sub catalogs because you don’t need to partition the data because you can search for the data on what continent it’s in or what path row it is if it’s gridded data or you can essentially partition on the fly anything you want. So that’s the important distinction between static catalogs and an API. We get the question a lot.

People have static catalogs and they ask, how can I search this? And you can’t really search it. You have to index it first. Like there’s that missing piece. But Stack originally, Chris Holmes really wanted us to focus on being able to have static catalogs because not everybody wants to stand up a server and incur the cost of that. And they just want to make data available and they want to share it with people.

And so the easiest way to do that is just have the metadata on disk. And it’s all linked to each other so you can crawl it and index it if you wanted to do that.

Jed Sundwall: That’s right. Yeah. I mean, I can speak to this. mean, I think, like this is a long time ago now, like when all this stuff happened. So when, it’s relevant actually to another, comment or question from SIG on YouTube asking, you know, did, so did the chicken or egg come first, IE the stack or the S three and the cloud optimized formats. I assume stack wouldn’t exist with only old files on disk. So, a lot, there’s a lot to respond to there. first I’ll say,

This is a fundamental issue about sort of the distinction between file storage and object storage that is like just not obvious to most people because they never have to think about it. Is that like, if you’re using a file system, like if you are, if you’re using a computer, like laptop or, you know, normal, normal computer with a GUI and stuff like that, you’re probably are interacting with the file system. You know, your computer needs to have an understanding of like, what are the files on your hard drive and has an index of them.

It also has an index of how the directories are nested and things like that. And you can search your computer for files and stuff like that. Otherwise, a lot of applications would be a huge pain to use if you didn’t have that index. Object storage like S3 has nothing like that. So object storage is just like you have a file, you give it a key name, and you put it in a cloud. And it’s there. If you know that key name, you can get it back out. And so this was…

this is the issue going back to the discussion before about like too much data. You can imagine a scenario where you have so many objects, you have so many files you’re dealing with that even the index of them would be too large for your laptop. Like just listing the names of the files would be too large for like a lot of people’s local storage. Like this is not a crazy idea, let alone like metadata about all those sorts of things. And so.

Matt Hanson: Hmm.

Jed Sundwall: Stack and a lot of sort of a cloud optimized approaches are an attempt at standardizing or finding patterns whereby we can break up all of this content into ways that are manageable. that has to do with things like stack catalog, as you described Matt, like with all these JSON files pointing the way. And also things like naming conventions for things and stuff like that, that like all add up to make that stuff work. The only other thing I’ll say is that, you when we brought

Lantz had onto AWS. The metadata that USGS would provide in its tar balls with the imagery was just this like weird text file that was like space delimited or something like that. Do you remember these? Yeah, the MTL files, right? And so we set up, didn’t, you I was just like, you know, I think it’d be better at least if this isn’t JSON. And so we, what we did is we created a process that was happened to the end of every image that we.

Matt Hanson: MTL. Yeah. Yeah.

Jed Sundwall: brought in and turned into a cog, as soon as it all landed in the bucket, we would run a Lambda function to take that MTL file and turn it into a JSON version of it. And I think that was kind of the kernel of like the sort of the first notion of doing something like this, where it’s like, you should be able to get to an image and you should have a reliable little machine readable, you know, or like easily parsable bit of metadata that you can find right by it.

Matt Hanson: Yeah.

Jed Sundwall: And then I guess then also just to close this off also with the understanding that yeah, there are a lot of people that are never going to run their own API. They can’t stand up a service and there are a lot of data products out there that do just need to land somewhere. And if somebody else wants to index them, they can. And I think the static stack catalogs make that easier, I would say.

Matt Hanson: Yeah, yeah, yeah, exactly. Yeah.

Jed Sundwall: Okay, so now let’s talk about the blog posts that you wrote, like the sort of the history of stack. Give us the high level overview. I we’ve, included the link to it in the, as we’ve promoted this and stuff like this, I’ll, I’ll, we’ll, I’ll have to put it back in the, in the chats and stuff like that, but it’s really good. But summarize it quick. It’s a comprehensive post, like, what’s tell, tell the story again.

Matt Hanson: Okay.

Matt Hanson: Okay. so yeah, it is, it is a bit lengthy. So yeah, so I did these two blog posts. the first one I wrote a couple of years ago and, I always meant to write a part two and, and two years passed. and then I’m like, you know what? I really, I’ve long been wanting to do it. had draft and various, conditions. So, finally I’m like, this is the time, you know,

Jed Sundwall: Yeah, you did it.

Matt Hanson: stack was just as we were publishing it stack was just accepted as a community standard for OGC so it’s like it seemed like a good time to actually publish it so the most recent post is called why stack was successful and it really looks at like like how on earth did this effort that started back in 2017

with with 20 people in a in a small room at the Marriott in Boulder like how did this turn into something that is now being adopted by commercial companies that are launching satellites as well as space agencies so NASA USGS for the Landsat program was definitely an early adopter that helped a lot so So I talk about this idea of guerrilla standards

And I gave a tip to Howard Butler on that, because I love the term guerrilla standards, because it really encapsulates what this process is and how it’s different than traditional standards work. And so that’s big part of it. And we could talk more about that, about the guerrilla standards. But it’s this grassroots approach where you get people that are interested, you get stakeholders that are interested in

doing something better and working within a community rather than making a bespoke thing on their own. And you build something that will serve everybody’s needs. And this is, I think this is critical because I’ll skip to the end a little bit again here and say that.

the conclusion of this is that there’s really, when we talk about standards, there’s really only one metric. Well, I say there’s three metrics that matter as a bit of a joke, which is adoption and adoption. And that’s true. you can have the most elegant standard that could exist. You could spend lots of time and make sure, and this covers every possibility and it’s very elegant and very nice.

Matt Hanson: But it doesn’t get used and so that’s not a success story at all You can have something that’s maybe a little crappy and If everybody uses it, it’s hard to argue that like the crappiness was a bad thing if everybody’s using it it’s improving everybody’s lives and it’s making interoperability easier and so The central thesis of

Jed Sundwall: Yeah. Yeah.

Matt Hanson: of the post was, that adoption is the only thing that matters. And then it exam, I examine like, like, how did we drive that adoption? Like how did we ultimately, like that’s the question, right? It’s like, it was successful because it’s been apparently adopted pretty widely. And so what was it that we did that drove that adoption? And part of that is the guerrilla standards approach of

getting stakeholders and getting champions and getting people excited about it and having buy-in from people. You know, that’s an important piece of this is that when people are part of a process, they’re more likely to use it. Their concerns and their needs are being listened to and they’re more likely to go back and champion it and use it internally for their own projects as well.

Jed Sundwall: Yeah.

Matt Hanson: And then another aspect is the simplicity of it. The core, the core spec. You know, this wasn’t about trying to make a standard for everybody and everything. This was about creating a spec that was going to meet 80 % of the needs to, and really focus on what those needs were. Like how do we find data? How do we have consistent metadata across?

different providers. How do we have something really simple and how do we encourage people to use it? We encourage people to use it by creating an ecosystem of tooling so that there’s a low barrier to entry. so the ecosystem is part of the guerrilla standards approach is that you need to start building implementations. And at that first sprint back in Boulder at the end of the day, thanks to Rob Emanuel and Seth Fitzsimmons,

Jed Sundwall: Yeah.

Matt Hanson: We had a server working at the end of one day that was serving up NAIP data. I don’t think we went back to it. It doesn’t really resemble much of what stack looks like today, but that wasn’t the point. The point was that we got some ideas together, we stood it up and it worked and then we could continue to iterate on it. So let’s see what other…

aspect of the post that I feel like I should call out. The community collaboration is critical, like having in-person sprints that are open for anybody to join. That is key as well. And I would be remiss if I didn’t mention the timing. The timing, I think this was just serendipity perhaps.

But the timing of stack was critical to its success. We were at a point where the public clouds were maturing to a point where like geospatial, we were starting to see more geospatial data on it. you you were just talking about your effort on bringing the Landsat to AWS. Cogs were really starting to gain traction. There was lots of launches and explosion of

private companies launching satellites. so there was just this real, there was a real need there. Like the previous methods weren’t working. And so there was a real need in this and no one else was really solving that. And so it just filled the missing layer at exactly the right time.

Jed Sundwall: Yeah.

Jed Sundwall: Yeah. Yeah, no, it’s it’s great. mean, it’s,

such a fascinating example of like, of really what we’re actually trying to do with this live stream webinar podcast thing, which is like, we know some things have worked. Like we need to understand like, why did they work? Like what made the difference? And like, it’s so easy to look back at, I mean, it’s very easy to look back at failed attempts at foisting standards on the world, you know, so many standards that have not been adopted at all.

Right. despite, despite all the good intentions and the need and things like that. And so it is, it feels mysterious why stack was successful, but I think your post and everything you just said makes, you know, makes it’s not a mystery here. Like I think we can probably look back at things that, made it successful. And, it’s actually kind of interesting timing. got another comment on YouTube from, I don’t know this username is bent quarter.

So bent quarter.

Who knows? But asking, is there a GUI for building a stack? Which is super interesting question because everything you’re talking about, you know, you talk about like, you know, we got all these people together and it was easy for them. And you know, we, we had a server running by the end of the day. It’s like the people that we’re talking about are data practitioners. It’s a pretty esoteric, like cool kids club that, know, these sprints, they’re not huge. It’s a, it’s a, it’s a small group of people who really have practical experience and needs.

Jed Sundwall: that they understand each other, which has allowed that, I’d say like allowed you to gain traction really, really quickly. But yeah, we are at the point, I think like this question, like is there a GUI for creating a stack? Like that’s an interesting question. Like certainly wasn’t the priority, but where are we now?

Matt Hanson: yeah, it is an interesting question. So the answer is no. Like there, there is, right. there’s, there’s interfaces for browsing catalogs. there’s stack browser. we have a user interface that we stand up for earth search, called film drop UI that, is, interface for stack API. There’s others out there as well. Microsoft planetary computer, has the user interface, but yeah, these are all kind of focused on.

on being able to search and browse existing APIs, not actually creating your own. And I think that’s just because, those are different user bases. Like the people building the stack metadata, are generally developers. and you have a bunch of data and you need to pro you want, you want to generally programmatically create the

the metadata from it. So like extracting the footprint of it or pulling metadata fields that are important from the original metadata or from the headers from the data file. So that really is done in a programmatic way. I think someone might have created a user interface for creating collections.

It would just be a form field where you can go in and fill things out. But it’s not a bad idea either, like having some sort of user interface to make this easier. But I think it would have to be combined with some back end that is where maybe you’re dragging and dropping a series of files. And then it’s going to try and fill stuff in, but then gives the user an option to be able to add in additional details.

and then extend that to ingesting a bunch of other scenes from it. Like maybe there’s something there that actually could be useful and make it easier for users to make their own. There’s some tooling for the CLI for creating stack. Like there’s Rio Stack, which can be used to create a bare bones stack item from cogs. But yeah, no one’s.

Jed Sundwall: Yeah.

Matt Hanson: really brought up a GUI for building a stack.

Jed Sundwall: Yeah, that’s an interesting question. but it, also gets at, think like sort of the challenge that I think, it’s a huge challenge. It’s a challenge that like a lot of government executives need to understand a lot of people working in policy, people working on workforce development, people educating, future leaders and data scientists is that like the volume of data that we’re working with is so large that like,

the notion of distilling or creating tools that are really designed for like humans to like click and drag and point at things and track with your eyes, like to do stuff. That’s not how it’s going to be done. It’s yeah. Yeah.

Matt Hanson: Right, right. It has to be programmed. And so that’s why I said, you know, like a GUI that allows you to maybe set that up, right? Like set up the programmatic creation of it. Like that might be useful, but you’re right. Like you’re not gonna, you’re not gonna manually create a stack for every scene, you know, for every, for every item or, you know, image.

Jed Sundwall: Yeah.

Jed Sundwall: No. Yeah. Yeah. I mean, and that’s like, this is not to, you know, to dismiss the idea again, like it should there be a GUI or like why, you know, this is, it still remains an interesting question, but it, I think it reveals the fact that like stack emerged because we were dealing with, suddenly found ourselves dealing with so much data that there was, it required a sort of purely programmatic approach at first.

Matt Hanson: Yeah. And those were the first, like those were the first use cases too, right? Like, was, was lands. It was these big archives. It was the Landsat and Sentinel was NAEP. and it wasn’t, like small amounts of like commercial imagery because we didn’t have access to those. you know, like that was money. So, this was the primary use case was how can we make it easier for users to access public data sets?

Jed Sundwall: Yeah. Yeah.

Jed Sundwall: Right. I’m imagining now like a use case where like an entirely local use case where it’s like, okay, as I mentioned to Matt before we started streaming, there’s a mudslide in my neighborhood in Ballard. I don’t know, I don’t know any details about it. I hope no one’s heard or anything like that, but literally right now there’s a mudslide in my neighborhood, but you could imagine somebody going out there like with a laptop, a drone, flying some imagery, producing a relatively small product.

and wanting to package that up in a nice tidy stack catalog that they can then get out somehow. And that’s like kind of like, I could see that as being a very sort of like lay person, not touching the cloud kind of Dropbox scale type thing, you know, that you could do. And that’s a maybe use cases like an emergency response type thing for something like this.

Matt Hanson: Yeah, for sure. Yeah, that would be. And, you know, some people I think have created stack catalogs for small data sets like that. But then that raises the next question, which is, how do people find the catalogs?

Jed Sundwall: Well, I I want Source Cooperative to be a place where people find these things. So brought to you by Source Cooperative. This is our podcast, so I get to do stuff like that. Well, thank you, thank you. Well, thank you. Yeah, actually, let me, I’ll do, that’s, that is actually a prompt to do what I said I was going to do. We’re going to do housekeeping really quickly. And just because we know that some people have joined Midstream.

Matt Hanson: Right, so there’s, yeah.

Thank

Yeah, you get that. Yeah, that was a lead-in for you to plug it.

Jed Sundwall: So this is Great Data Products. It is a live stream webinar podcast thing brought to you by Source Cooperative, which is a data publishing utility that we manage. You can go to source.coop to learn about it. But this is the time where we talk to data practitioners about their craft. And this month we’re talking to Matt Hanson about the Spatial Temporal Asset Catalogs or STAC metadata specification, which has been wildly successful.

And then to, I’ll, do a little bit more self-promotion on this. we wrote, there’s, there’s, is great data products, the live stream webinar podcast thing. we also wrote a blog post or publish a blog post, a little bit ago called great data products that has, I it’s done pretty well. you can go to radiant earth at radiant.earth slash great and read that. and I’m just, but I’m going to share something.

Let’s see, I don’t know if I can do this. Can I share my screen? Yeah, I’m gonna share a window in response to, again, back to the question about GUIs. And so this is a drum I’ve been beating for a really long time. This is a graph I’ve been talking about for forever, many years, but it’s been enshrined in this blog post. I’m gonna like…

Expound on this in a future post, but like it’s.

it’s useful to understand or sort of to think about how do you maximize the usability of data and like why a programmatically accessible approach is so important. So if you have raw data off of a sensor, it is not going to be that useful to that many people. Like there’s a cost, just like an inherent cost required to like extract any sort of value from it. And satellite imagery is sort of like notoriously difficult here.

Jed Sundwall: which we can talk about all the reasons why that is. so, but what often gets funded is like, I want a thing that’s going to track mudslide risk, you know, for example, in the Pacific Northwest, right? And so you can spend a lot of money sorting through the data, processing it, creating an interface, you know, doing user testing to create a tool that helps you understand flood, you know, mudslide risk in the Pacific Northwest.

You’ve gone over this huge arc where you, you spend a ton of money, but then the potential value of the data is then diminished again. Right. And so this is always kind of like my warning against focusing on, on, on GUIs or dashboards and stuff like that is that by creating an interface like this, you’re making a ton of decisions about like what the value of the data is. And like, instead what we should be trying to do is like,

How do we maximize the query ability of the data? And then sort of like, it’s this, again, I call this the sweet spot graph. We have to find this place where it’s like, we’re taking out a lot of the annoying, undifferentiated heavy lifting required to like get the data in a way that’s queryable without over determining it. so, um, anyway, I’m preaching to the choir with you, Matt, but I just,

Matt Hanson: Yeah, no, you know what a great example of that is too, is Landsat. Let’s take look at Landsat. There are two processing streams that Landsat does. They have an ARD process, is in one projection. It’s actually in an alberts projection. There’s five different albers, maybe seven albers projections, depending on the continent and the place of the earth.

Jed Sundwall: Peace.

Jed Sundwall: Yeah.

Jed Sundwall: What’s your favorite? Sorry, I’m just kidding. Yeah.

Matt Hanson: Favorite favorite continent favorite Alvarez projection I don’t know

Jed Sundwall: I’m sorry, just go on. I’m being, I’m trolling you. Sorry.

Matt Hanson: so there’s the, there’s the ARD stream and like, that’s distributed as these, as these ARD tiles. And then there’s the regular stream of data, which, which delivers UTM, tiles. So the question is like, why, why these two different things, right? and the reason why is because people like, and are used to UTM because it’s a nice pretty picture, but it introduces more errors than the

than the Albers projection does. The Albers projection minimizes the distortion errors from the original raw data. And so I have this thing that I like to use, which is as soon as you pick a projection, it’s the wrong one. And so this is in the graph because it’s like, you know, wanna maximize value, right? Then you should try and avoid making assumptions about how people are gonna use that data.

Jed Sundwall: Right. Yeah. Yeah.

Matt Hanson: Projection is a perfect example. Rather than picking a projection that you think is going to be useful for everybody, just what’s the one that’s going to minimize the potential errors? Because you know that people are going to reproject it. MODIS does this great. MODIS does a sinusoidal projection, which is the best projection for minimizing distortions due to the orbit of the craft. Everybody hates it because it doesn’t make for very pretty pictures if you open it up and just look directly in QGIS.

Like it looks all wonky, but it really is. It really is the best choice for that space.

Jed Sundwall: Interesting.

Jed Sundwall: Fascinating. Oh, wow. Okay. You know your stuff. Yeah. No, no, it’s great though. We have a pretty interesting question about this though. Like on this note of like, what is the right way to present data from the great Max Lenorman, who I’ll just embarrass him a little bit more. Like there’s no way we’d even be having this podcast if it wasn’t for.

Matt Hanson: I know a couple things and I just keep on reusing the same stuff.

Jed Sundwall: Minds Behind Maps and the approach that he took with that. So he asked, does this still hold in a world where it’s so much easier to make custom dashboards, GUIs, front ends with AI? I have a response to that, but I’m curious to hear what you think. I mean, especially about, have, element84 does so much great work producing really interesting tools. Yeah, what are your thoughts on this?

Matt Hanson: Well, I like, I like you guys. They’re pretty, you know, but they’re also pretty impractical. Aren’t they? Like if we look at the data, 99.9, 9 % of the data out there, right? No one’s ever going to look at.

and so I do think we spend an inordinate amount of time focusing on visualizing remote sensing data when that’s actually not really a great use case. I, the demos and outside of pretty pictures, maybe, you know, journalists like that, if you’re telling a story. and so, you know, it’s great that it’s easy to make custom dashboards.

You know, and I, I’ve been working on some UI stuff recently and it’s fun, but, yeah, I think from a practical standpoint, we need to be focusing more on unlocking the value and the data with, you know, with programmatic backends.

I don’t know if that really answers the question now.

Jed Sundwall: Well, yeah, I I can, I think, a, I agree with you. mean, I think you wise are maybe the wrong thing to be thinking about. I would, so I agree with you on that in the sense that like, I use this example all the time. I may have already mentioned it on this podcast. I probably will again in the future of like so many like attempts at making earth observation data useful for like agriculture, you know, especially in like low and middle income countries where it’s like, no, it’s great. We’re going to give the farmers an app and then they’ll know what to do. And I’m like, no one’s going to use your app. Like.

You’re not a the farmer’s not going to install your app. They’re not going to open it. It’s not going to become a part of their life. Like it’s possible. Like that does. There are sticky technologies that people do, you know, become part of people’s lives. But like it is so expensive to make that happen. And it’s so rare for it to actually happen. My theoretical hypothetical like Earth observation application for the farmer in a poor country is suddenly they have.

Matt Hanson: you

Jed Sundwall: they can get insurance for some reason. They don’t know why, but like there’s a flyer for them to get insurance or a salesperson comes and visits them and is like, hey, we can actually sell you affordable insurance now. The basis of that insurance product is Earth observation data that allows for that insurance product to exist. It is a product of data, but the farmer doesn’t have to know anything about that. The value of it gets.

built into the price of the insurance and like that’s how the value is delivered. Is there a GUI or some sort of UI to the data between the receipt of the data and the creation of that insurance product? Like maybe, maybe not, but like I think increasingly, so partially to answer Max’s question, like in the age of AI, and I know Matt, you’ve said stuff about this before, like it’s just gonna be a model doing all the analysis, you know, and

Matt Hanson: Mm-hmm.

Jed Sundwall: And what’s derived out of it is going to be like some sort of index or figure that gets put into a spreadsheet or database or, you know, inform some other process. Yeah. Yeah. Yes. Which by the way, you can preview CSVs on source cooperative now, which is amazing. Yeah. Go.

Matt Hanson: That’s right. It’s tabular data. The future is tabular data. Yeah.

Matt Hanson: Nice, that’s great. So, all right, so this is a bit of a tangent, but like I feel like it’s maybe a good time to say this. And I used to give a presentation and I talk about this a little bit, but, and I don’t know, this is gonna seem like a tangent, but.

Jed Sundwall: Go for it. That’s why we’re here.

Matt Hanson: You talked about the farmer, you know, and, getting the app and, know, there’s another reason why that doesn’t really work. And it’s because remote sensing sucks. All right. RSS remote sensing sucks. And what I mean by that is that like, you know, I mean, I’ve been in this space for a while, right. And, and if you look at old research papers, new research papers, and like, take a look at land cover products, for instance.

You can get land cover products from different producers for the same year using the same data. And like, they might be 20, 30 % off, 20, 30 % disagreement with each other. Because there’s a lot of stuff that goes into the image that’s formed. The entire big equation, the radiative transfer equation for, for

for how that light propagates and gets the image means a lot of variability. And when we talk about level two data, we have atmospheric correction, which also includes a tremendous amount of variability. so I have this issue with the ag community because I feel like, and lots of other, I think industries have done this as well, where they’ve over-promised.

and they’ve under-developed what remote sensing can do. you know, 20, 30 % errors are not uncommon. But if you go to an engineer that is doing space exploration, right, or any other engineering discipline, you’re like, oh, 30 % errors are normal. They’re gonna laugh at you, right? We didn’t send people to the moon with 30 % errors, right? Like, you’re gonna miss the moon. So I think there’s an aspect here of like,

Jed Sundwall: Yeah.

Jed Sundwall: Right.

Okay.

Matt Hanson: having realistic expectations around what remote sensing is capable of. And traditionally, back before Landsat was available on S3, the people doing that work were scientists. And so I don’t think it really, it didn’t really come up that people were like misusing remote sensing data in a bad way.

But once that data became available to the masses, and this kind of ties in some of stuff you were saying before, everybody started using this data. lot, like companies were like startup companies were starting to leverage this to generate NDVI. I remember working with one company back using that Landsat data, Calculate NDVI. And the problem was, and I think I’ve told you this before, Jed, is that that data was not appropriate for doing that, right? Like the original Landsat data that was on

AWS was level one data. It wasn’t even level one, top of the atmosphere data. was like top of the atmosphere prime. So it wasn’t even accounted for angles. so, and so that just like, I think that ended up causing more of the same problem. Like people continually, you know, being over-promised what remote sensing can do. So that’s my, that’s my issue with, with the Ag community is that I, I,

I feel like they’ve over-promised what is capable, what it’s capable of. Remote sensing is very powerful because I might not be able to measure that water quality in a lake very well, you know, within some air, but I can look at every lake in the world, right? Every day. And what it’s really, really good at is looking at relative differences. So time series. So time series.

Jed Sundwall: Yeah.

Jed Sundwall: Yeah. Yeah.

Matt Hanson: is where remote sensing really shines, being able to look at change over time and differences. And then this leads into a whole other segue of this is why most commercial satellite data providers have bad business models.

Jed Sundwall: Yeah.

Jed Sundwall: Okay. We should keep going down this path. think.

Matt Hanson: no, it’s that like they’re, focused on this idea of selling imagery, right? Like scene by scene and, and like, and, and there’s really limited use of that. maybe for photogrammatists, like, you know, looking at it, like that’s how we originally use it. We have a high resolution image and we’re going to look at it and identify things. But the real value in all of these archives of data is that is, is the time dimension.

Jed Sundwall: Right. That’s right. Yeah.

Jed Sundwall: That’s right.

Matt Hanson: And so I don’t know, I hope for a future where those archives are maybe unlocked. Maybe there’s a subscription model where you can access the whole thing, the whole entire archive. But like this whole piece, me and like by image by image just seems, it seems a little ridiculous.

Jed Sundwall: Yeah.

Jed Sundwall: Yeah.

Jed Sundwall: Absolutely. Okay. mean, yeah, this is sorry. Yeah, yeah, yeah. I mean, we’re tipping into the philosophical, which is great. That’s we get to do this is like, I like to say like, imagery is a metaphor for the data. Like, it’s like, yeah, like imagery is like one way to see the data because you want to see it, right? Like, I went through this a bunch at when I was at AWS, you know, building the open data program is that I have had, I’d have executives that are like, where do I see the pictures?

Matt Hanson: Alright, there’s a bunch of things there.

Jed Sundwall: Like, what will it look like? And I’m like, well, do you know what an S3 bucket looks like? you know, it’s just like, it’s a bunch of objects, you know, with names, like it doesn’t look like much. We had the same issue when we started hosting Hubble Space Telescope data, where people are like, I want to see pictures of like the of the galaxies and stuff. And I’m like, yeah, that would be cool. Like, that’s not what’s in here. Like, this is like set. This is telescope data in a weird format called fits that has its own, you know,

Matt Hanson: yeah. Right.

Jed Sundwall: great wonderful people trying to figure out how to make how to cloud optimize it but it’s like the imagery is a derived product that’s made for a human to look at with human eyes that’s just one tiny sliver like one tiny slice of like how this data can be interpreted or used so yeah i will i feel like i i do feel like i want to defend myself with the lanset stuff i’ll first of all just say like

Matt Hanson: Yeah.

Jed Sundwall: I didn’t know what I was doing. Like, I was just like, well, look, we’re going to bring the Landsat data on AWS. I, I had some ideas, uh, bandied that I bandied about with Peter Becker from Esri and, um, Frank Warmerdam at Planet, you know, specifically like I consider them like the two people that were like, you should do this internal tiling and overview thing that ultimately became the, you know, known as the cog. and

Matt Hanson: Yeah.

Jed Sundwall: That was it. But I was just like, well, we’ll just see what happens. but I’m, I guess my question though is like, is that a solvable problem? Like is any data fit for, you know, safe for public use and distribution?

Matt Hanson: Probably not. mean, right? every data can always be misused. So, and don’t get me wrong, right? Like that move of Landsat to the cloud was huge. It was really popularized Landsat. We wouldn’t be where we were today if that data set, that very important data set wasn’t there. But the time was that…

Jed Sundwall: Yeah, I don’t think so. Yeah.

Jed Sundwall: Thank you.

Matt Hanson: like that data wasn’t available really. Well, it was available to people, but like that’s not who was using it. Right. It was, it was scientists and it required that anybody using it probably should have opened up the Landsat data user handbook and read like what the data was and what needed to be done for it in order to do things like compare NDVI over two different days. cause you couldn’t do that.

but people did it anyway. And so, but like, I could point to other data. I’m sure that that’s, you know, that happens all over the place. education, right. It’s a good thing relying on experts. Like, you know, these are things that, that companies need to do is value that expertise in the geospatial and remote sensing domains. and not just assume that because data is easily accessible.

Jed Sundwall: Yeah. Right.

Matt Hanson: and you can just easily find it that like you can do things without really knowing what you’re doing.

Jed Sundwall: Right, right. Well, yeah, I I think I would just, I advocate for sort of permanent constant vigilance and skepticism around everything. mean, the history of the internet so far, you know, which was designed explicitly to like improve the sharing of like research data, you know, I mean, that was Tim Berners-Lee’s like goal was like, I want to be able to share stuff with my colleagues more easily. We’re sort of epistemologically like,

It’s very hard to say whether or not we’re better off because yes, there’s a lot more information out there. I would assume a lot of it is accurate and great and pristine in a lot of ways, but like there’s really never anything, never anything stopping anybody from twisting it, interpreting it, turning it into a narrative that, you know, fits whatever their, their agenda is. Let me go to the comments again. Sig asked about WMS and how it made easy to get

Matt Hanson: Yeah.

Jed Sundwall: Many large raster image, well, I’ll just put it on the stream here. To get imagery into legacy desktop and web apps might stack be implemented in a similar fashion. It has been. mean, Esri supported stack for a super long time. Do you have comments on that?

Matt Hanson: Yeah, Yeah, stack. There’s a new QGIS feature. There’s a stack plugin that actually works really fantastic. So yeah, think that that’s already happening.

Jed Sundwall: It’s yeah, it is happening. And then from CJ Levinson, I’m curious to hear how this conversation extends to model data sets as opposed to remote sense data and how this relates to my, to my point, Jed’s point of good data products being about making less decisions. So yeah, thinking about climate models, weather models, mostly modeling outputs, which would be the main geospatial artifacts. So yeah, I mean, element84 has done some great thinking on

embeddings data products and things like that. think that’s relevant here. What’s your thought on this, man?

Matt Hanson: Yeah, well, there’s a couple of aspects here, Well, there’s the aspect of how these model data sets, like these generally large homogeneous model data sets, fit in the stack. But I’m not sure that’s the question. Is that the question?

Jed Sundwall: No, yeah, less about stack, just more, think about how we’re talking about like, you know, data that’s fit to be shared and fit to be used. And, you know, now we’re dealing with like data products that are, are just model outputs. So like a model’s done a bunch of magic on them.

Matt Hanson: Right.

Matt Hanson: Yeah. So I think that gets into your curve, right? Which is like, you know, we’re, we’re in the curve is that modeled output, but like, generally speaking, I think that, like that’s generally what we want, right? Like, this is what users want is they want the modeled output. They, don’t want level two Landsat data. they don’t even want level three. want, you know, what they want is they want planet variables, like planet lab variables. Dataset is exactly the type of thing.

that we need to see more of, think, where this isn’t imagery, this isn’t time series, this is like, I’m looking for a particular type of data variable, and I can get that, and it’s been derived from imagery, but it’s gone through a process that weeds out all those edge cases and everything. I think planetary variables are great.

That’s a great data product right there.

Jed Sundwall: Yeah. I would also say, so this is a shout out for, um, the time to shout out dynamical, um, another, so the, the dynamical podcast, which is called weathering, which is just a, an absolute delight. Um, this is from the people who build upstream tech. Um, but anyway, they’ve, they’ve, have this great podcast where they have, they’ll actually read papers on, you know, weather forecasting and, and, um, advances in weather forecasting. And in a recent episode, if I, uh,

Let me see if I can remember which one it was, but it was, I think it’s the one.

on, yeah, that’s the most recent one, a taxonomy of bias, sense-making, heretical physics and the Tom Hanks, Bill Murray multiverse. It’s a good episode. But where I think they discuss how like, you know, we already interact with a lot of models and develop opinions of them over time based on their usefulness. Right? So like you were saying before, like a lot of satellite imagery has these like insane, you know, error rates or whatever. They just have like,

substantial error rates, right? They still might be useful. know, there’s this, you know, the adage that like all models are wrong, but some are useful. And so, yeah, I mean, I would say, I guess I would just, I’m just going to agree with you to say like, this is what we want, are to have models that are able to distill data into things like planetary variables or like basically things that can support decision-making. And I think people aren’t idiots.

You know, like they’ll figure out like, is this useful to me or not? And, and it’s possible that sometimes the model gives you something that’s like catastrophically bad and like you lose money on it. And you’ll, you’ll be able to make a decision whether or not you want to trust that model again. You know, it’s, it’s the way the world works. Like, I think it’s so easy to think about, like, or just it’s so easy to like, like over, overthink this sort of stuff. you know,

Matt Hanson: Right.

Matt Hanson: Mm-hmm. Mm-hmm.

Thank

Jed Sundwall: man, I’ve missed out on LinkedIn. People have been saying stuff.

Matt Hanson: Uh-huh. So while you, okay, before you do that, have I told you about my, my Star Trek theory of, of remote sensing? Have I ever, okay. Well, we’re on a podcast, so I’ll have to now explain it anyway. Even if I, even if you had said, yes, I’ve heard this before. so if we look at Star Trek, right, like my whole vision of, of the future, I hope is, is more, is way, is way more Star Trek.

Jed Sundwall: Yeah. Go. No, no, no. Go for it.

Jed Sundwall: I love this. I remind me.

Jed Sundwall: Yeah, yeah.

Matt Hanson: then a more dystopian version, but, in Star Trek, you have tri quarters, right? And you have sensors and, and, and what are those sensors not do they’re not sending back images that are then analyzed, right? You’re, scanning for life. You’re scanning for a particular element. You’re scanning for specific variables. And I think that maybe there’s an aspect here. Like we, we creating general purpose satellites.

historically Landsat, right? It’s like, well, we don’t really know this could be used for a bunch of different things, but we’re increasingly, I think, seeing companies that are coming up and, creating satellites for particular specific verticals. selling the satellites and satellite as a service. And I think ultimately maybe that’s where remote sensing goes, where there isn’t a satellite that’s like taking an image and then we’re down linking that and then like.

figuring out a bunch of different use cases and using it for a bunch of different use cases, but rather it’s like, no, this is like, see this with GHG set, right? It’s like, no, this is a satellite for detecting methane. Like it’s a single purpose thing. It’s the Star Trek. It’s like scan for life. It’s like it might, that might actually be an optical satellite or it’s a SAR or something like that, but it’s doing something and doing something on board and then just sending back just the thing.

Jed Sundwall: Yeah.

Jed Sundwall: Right.

Jed Sundwall: Yeah, okay, sorry, now I’ve got it, this is great.

Matt Hanson: Thank

Jed Sundwall: Go Star Trek. It’s funny, I’m not a Trekkie by any means. I did watch the Next Generation a bit when I was a kid, really liked it. But I brought Star Trek up at some recent open data event that I was at. just being like, because people are like, are there any examples of like literature or stories about like the future of like technology where like things are good? And I’m like, I think Star Trek is like one of those, you know? Yeah.

Matt Hanson: Yeah.

Matt Hanson: yeah, yeah, it’s, yeah.

Jed Sundwall: Cause we are, we’re so like, we’re just so steeped and we have been for many years into kind of like dystopian technological stories and stuff like that. And I think we should keep Star Trek in mind as like a vision of where we could take things. you reminded me though. I was, so last week, a bunch of our friends were at a national academies of science workshop on earth observation and the future of data stewardship. And I pitched basically what you just said in a way. I mean,

Matt Hanson: Yeah, absolutely.

Jed Sundwall: We worked within groups to come up with a 20 year strategy. And I had some license to kind of steer the Ouija board, as I would say. We’re all hacking on these ideas, but this really wasn’t my idea because it really did come out of the group. was just sort of this realization that I think we know a few things that we want to accomplish in terms of governance and let’s say environmental management or something like that.

Matt Hanson: Right.

Jed Sundwall: And rather than looking at the next 20 years of Earth observations and thinking like, well, what sensors do we need? You know, and what file format should they be in? You know, what should the standards be and like, who should pay for it? And it’s like, but what I, I led with when I was sort of reading out from the group, like, I think if we’re thinking 20 years ahead, we should assume there will be more sensors. There are going be more data products. There are going be more models producing all sorts of stuff, more users doing weird things that we could have never anticipated. And what we should probably do.

Matt Hanson: .

Jed Sundwall: And I cannot emphasize how hard this was for me to say out loud. We should maybe look at something like the sustainable development goals. I like to make fun of the sustainable development goals because it’s just like kind of a bunch of hot air in terms of like, it’s like, that’s nice that you created these goals, but like, really? Like, is anybody going to do anything about this? And, but the truth is like, well, we, but we should, you know, so like one is like, we should like, it’s, it’s just like,

Matt Hanson: No.

Matt Hanson: Yeah, we should. Yeah.

Jed Sundwall: I make fun of them. Sorry, everybody. But like, it’s just like the UN doesn’t really have the ability to herd the cats that are nation states to get them to do stuff, right? I think we’ve proved this has been demonstrated. so, but the sustainable development goals are like really good goals. So it’s like, hey, you know, we really want to ensure that every one of the world has access to clean drinking water. And going back to your point, what do we need to do that? And it’s like, it could be,

Matt Hanson: Yeah. Yeah.

Jed Sundwall: any number of different types of sensors and we should have some sort of entity that is actually held accountable to like making the end result happen. And who knows what kind of sensors they’re going to use. You know, we don’t need to say that like, I mean, it might come into the case like we need something like GHDSAT, you know, and the community that’s like driving at that specific goal can determine that.

Matt Hanson: I know.

Matt Hanson: Yeah. And they’ll need dedicated satellites to do that. Right? Like this whole shared, the whole shared satellites for all these different use cases. Like there’s just not enough tasking capacity. and power is in time series and you’re like maybe lucky to get an image like every other month. Like you really need, you really need a dedicated satellite for, for, for the purpose, I think.

Jed Sundwall: Hmm.

Jed Sundwall: Interesting.

Okay, I don’t have strong opinions about this. I’ve kind of always like thought, you know, there’s likely latent capacity in the satellites that we do have up that people aren’t, you know, just people can’t get access to, right? So like huge fan of common space, for example, you know, like could, well, it’s an example worth debating. I mean, you know, we were fiscal sponsors of common space, know,

Matt Hanson: Yeah, I mean there might be, but yeah, commonplace, right? This is a great example.

Jed Sundwall: Bill was listening in here, a glorious initiative. But there’s still, think there’s still like plenty of debate to be had, which is like, does common space need its own satellite? Or like, is there actually just like a legal financial policy hack that could make existing sensors, you know, actually useful for the humanitarian realm? It might be easier just to launch your own satellite at this point.

which is why I’m glad they’re trying to do it. But it’s, I think it’s a worthwhile debate.

Matt Hanson: Yeah, think, yeah, my sense is that it is. And especially if you want, if you want full control over it and you want to, if you want to revisit the same areas over and over again, like even for a disaster, right? Like, we focus on, on

imagery after there’s some disaster, like ideally you’d want to continue to look at that same area for some months afterwards to see about the recovery efforts or like there’s flooding, like how long does that take for the flood waters to recede? And so I just don’t see how you could get that much data unless you’re actually controlling the satellite and the ability to look at the same areas over and over again. Same thing with like infrastructure, right?

like companies that own and operate global infrastructure. Like, yeah, it totally makes sense for them to just own their own satellites. And like these things are pointing at the exact same areas day after day.

Jed Sundwall: Yeah. Huh. I wonder this Munich re is there is Munich re going to fly its own satellites soon? You know, it seems like, yeah.

Matt Hanson: I mean, it’s getting more and more cost effective, right? I mean, we’re seeing companies like pivot towards, you know what? We’re not actually going to sell pixels anymore. We’re going to build satellites. And, I think big companies, there’s lots of countries in the world too. Like this is, this seems like this is where the business is heading is smaller, cheaper purpose built satellites.

Jed Sundwall: Yeah. Yeah. This is all right. We’re in agreement. mean, this is again, what I was saying at this national academy of science thing last week. was like, I mean, I’ll say like, there were plenty of people that are like, oh no, only the government can do this. You know, everyone knows that. And I’m like, I don’t think that’s true. I think we’re going to see more satellites being flown by more actors. Linda’s chiming in on LinkedIn saying she agrees with the need for dedicated satellites, you know, purpose built. then yeah, bill’s open for the debate, but yeah, I think there’s a.

Matt Hanson: Yeah.

Matt Hanson: Nice.

Jed Sundwall: I’m, I find this compelling. want to get to, so Tim Bailey asked earlier about, the, the issue about human. He said there’s an issue about human inspection to validate interpretation. he says, I work in the forest wildfire resilience field where there’s a stampede of new data products that are not great data products. so yeah, I mean, we’re going back to the error rate issue and like, kind of like the issue of

This is, he posted this a while ago when we were talking about models and accuracy and, you know, making it, you know, actually like informing decision support systems. I’ll also bring up relevant to this is that I think Bloomberg published a story that’s been going around on, LinkedIn this week about Zillow removing climate risk information from its listings. I think.

Zillow and Redfin, they used to show like flood risk and fire risk. This is data that comes from first street foundation and they took it out. They took it off. And, the, the, the issue being that like, increasingly are encountering decision support information that could be fire risk for your house, you know, or for the house that you’re thinking about deciding to buy. but it’s coming from entities that people aren’t sure whether or not they can trust them.

And I think first street kudos to them sort of demonstrably have produced models that are better than FEMA’s models or like anything that the government’s been able to produce. but, still like validating that sort of information is, difficult. And I think we need, I’m perceiving a need for, I always say new, new data institutions. but like arbiters that can actually like help validate this stuff anyway.

Matt Hanson: Yeah.

Jed Sundwall: Over to you in case you have, I want go back there.

Matt Hanson: Yeah. Yeah. So while I think Tim has a great idea for your name, for you can start another podcast called not great data products and you can like, what do you think you can evaluate? Like really crappy data sets. Like this is the worst, you know, it’s like,

Jed Sundwall: That should be like, we should do special episodes every now and then just like, just talk trash about.

Matt Hanson: Yeah. Yeah. Not great. Yeah. Yeah. This is the worst. so yeah, I mean, so I feel like I just keep on ranting, on this podcast. like, you know, I think we have a real problem with startup companies, especially, don’t know. Maybe this is a worldwide problem. mean, I see it.

Jed Sundwall: Do it! That’s the whole thing.

Matt Hanson: you know, really prevalent in the US here. Startup companies doing really questionable science. And there’s, there’s, because they’re at odds, right? Like the business model that they have is completely at odds with the scientists. I mean, I guess we’ve seen this in, we’ve seen this in very high profile cases outside of the geospatial industry. But like we see it in the geospatial industry as well.

And people making promises for things that really just aren’t practical and over promising and under delivering. and so, yeah, I’m not surprised that like, that Tim has come across a lot of really not great data products. don’t know about the source of those, but like, I’ve seen that. I’ve seen that quite a bit.

Jed Sundwall: Yeah.

Jed Sundwall: Yeah. Well, I mean, look, it’s constant. mean, and I’ll say, I mean, this is why, this is why, I’m at radiant earth, right. And like why I left, it’s not why I left Amazon. Like I was, Amazon is great. Like I had a very good eight years there. but what I realized was like, no, we do need to have institutions that understand how to provide data, but that aren’t owned by investors. Right. So they don’t have.

Matt Hanson: It’s just constant.

Matt Hanson: Yes.

Jed Sundwall: the same sort of like forever growth incentive, which is not to say I should say like, some of my best friends are investors, you know, like, that might not be true, but like, I have plenty of friends who are investors. are funded by investors. don’t think investors have inherently malicious intent. What I would say is that like investor owned or governed companies that are united by, you know, just the need to grow constantly.

are not always going to be the best stewards of data. And I would say in almost all cases, they almost can’t be. The pressure to inshitify is unavoidable. And then also the sort of the competitive need precludes them from being like truly open about their models and how they operate, right? It has to be secret sauce, which I think if you’re saying like,

Matt Hanson: Mm-hmm.

Jed Sundwall: If you’re going out there and saying like, Hey, we have the data that is going to be used to regulate the environment and the real estate market and like risks to human risks to like life on earth. Um, you need to be held to a higher standard than just be like, and it’s good. Trust us. It’s a, it’s our proprietary secret sauce. Um, so.

Matt Hanson: Yeah. Yeah, yeah. And we can bring that back to STAC actually now because a years ago, well, there’s been an effort with STAC coordinating with SEOS. So Matthias Moore has done a bunch of this work. was involved. so SEOS, which is an international committee of space agencies,

Jed Sundwall: yes. Yeah. Bring it home.

Jed Sundwall: Well, yeah, we’ve been, he does that under the umbrella of Radiant Earth,

Matt Hanson: has a thing called ARD, CS ARD, analysis ready data. And so Matias has been doing work on like mapping their requirements for ARD back to stack. And so when I was involved with this a bit some years ago, we were in the early days here, we were identifying like, you know, what fields really need to be included in

in this for them to get the ARD certification or whatever from CIS. And I think the immediate problem that I saw was that you want to really, you really need and require radiometric and geometric accuracy to be published in that metadata. And I don’t think that there’s a ton, I could be wrong, but I don’t think there’s a ton of commercial

satellite companies that are really willing to do that.

Jed Sundwall: Interesting because of the proprietary nature of what they do.

Matt Hanson: because their satellites suck for the most part because like they’re, you know, they’re, they’re, they’re CubeSats. They’re like, they’re, they’re low cost, cheap, you know, things. Now, maybe I get a whole bunch of people mad at me, which somebody told me that means that you’re doing something right recently. But, you know, I don’t want to make a blanket statement about all of it. I love satellite companies, right? Like, you know, they’re, they’re, they’re, got some of my best friends are, are satellite companies.

Jed Sundwall: Okay.

Jed Sundwall: Amazing.

Jed Sundwall: Some of my best friends are satellite. Yeah.

Matt Hanson: But like, but the real, the, but the realistic assessment is that these are lower cost, cheaper satellites. And, and the radiometric accuracy is not going to be up to snuff compared to giant school bus size satellites like Landsat is.

Jed Sundwall: Yeah. Well, interesting. mean, but this does, so this whatever, this is a solvable problem. think you’re just highlighting the need that it needs to be solved is that if we are talking about a future in which more people are deploying sensors, you know, we’re having more low cost sensors going up. A lot of those are to be CubeSats, you know. But again, again, I guess the requirements are going to be bespoke, you know, in the case of every sensor.

Matt Hanson: Mm-hmm.

Jed Sundwall: to determine like, okay, does this meet our needs? You know, I’m a reinsurer. need to have control of my own satellite that I can task, but this is what I need. Interesting.

Okay. I put in links in the, in the chat to a blog post that Matthias wrote about the sort of the cloud native approach to, to doing this stuff. So, or to ARD and, and yeah, and shout out and thanks to, to NASA for funding us to be able to do that work with Matthias because it’s, it’s been great. Okay. Well,

We’ve covered a lot of ground here. mean, you’re, I love talking to you. This has been secretly like, is, this is one of the great things about doing this is that I don’t know the last time I had like an hour and a half or so to just talk to you about stuff. So it’s been a real treat for me. there anything else you want to mention? we didn’t talk about my white paper. Did you read my white paper? Okay.

Matt Hanson: I Yeah, I did yesterday. like, yeah, mean, a lot of great alignment with a lot of things. So yeah, there’s a there’s one thing I perhaps we can talk about this credibility issue because as I told you yesterday, you know, I wrote this blog post and afterwards a colleague of mine

Jed Sundwall: Yeah, yeah.

Matt Hanson: was like, well, there’s something missing from this post in that it seems like there was some, there’s something else that’s required here that you didn’t mention. and I think that thing is it’s this credibility issue. And what I mean by that is it’s not like, like if, if some random person, this happens a lot, right? Like they create a really cool thing and then they go out there and they’re like, Hey, help me with this thing. I want to create a standard. Like they just might not get a whole.

Attraction from that and with stack we had some credibility because of Chris Holmes Chris started it and he had a good reputation and he’d been involved with us Geo He knew a lot of people like he had he brought that credibility to it and we see companies like you mentioned the New York Times right with RSS or you know Google and Metta like they they

Jed Sundwall: Yeah.

Jed Sundwall: Yep, yeah, that’s right.

Matt Hanson: come out with standards, right? Like all the time, because they have this credibility. It’s not, they’re not gorilla standards, right? They don’t actually build them in a community, but they have enough credibility and weight behind them that they can accomplish a similar thing, which is this is a standard, use it, and people start using it.

Jed Sundwall: Yeah. for some reason I feel very compelled to share, I’ll also put in the chat, a link to you just haven’t earned it yet, baby, by the Smiths. Morrissey at his finest, you know, just be like, look, like there’s a, that it is a harsh truth that you have, that you will confront throughout your life. You know, when you’re trying to do anything is like, you do, you do need to earn that credibility. Right. And so like,

So my white paper is called Emergent Standards and basically it’s an exploration of how do standards emerge without an authority coming and saying like, thou shalt do this, right? Linda just commented on LinkedIn, H3 and Uber is another great example where it’s like, Uber clearly knows what they’re doing and H3 was obviously good.

Matt Hanson: Yes. Yep.

Matt Hanson: Mm-hmm.

Jed Sundwall: you know, for, for what it does. And they opened that up and it’s great. And now, you know, we talk about H3 a lot. the, so, so it’s, it’s this interesting, it’s all sweet spots. Like you can’t, we have many examples of institutions that are powerful and have sway in a lot of ways, trying to decree standards that just don’t work because

They are not actually aligned with what practitioners want. then so practitioners can come up with their own thing, but you still have to have a Chris Holmes in the group. have to have somebody who has the convening power or the credibility or something to like actually get, get you to pay attention. it would, which is a drag because it’s like, well, how do you do that? I’m like, I actually don’t know.

Matt Hanson: Yeah, exactly.

Jed Sundwall: Like it’s, it, feels like a historical accident in cases like when, when stuff like that works out. And that’s actually probably true. Most of history is a series of accidents. Yeah. Yeah.

Matt Hanson: I think that’s true. Yep. I think that’s true. Right. There’s been a lot of research into this. you know, if you look at like, you’ve probably familiar with this more than I am. Like if you look at like the path of Bill Gates and like other folks, it’s like, that have become billionaires or founded big companies. Like it’s, it’s, it’s a lot of being in the right place at the right time. It’s a lot of happenstance. It’s a lot of luck. It’s not just because he was brilliant and he just like did stuff and it was like, that

Jed Sundwall: yeah.

Matt Hanson: Like if he lived in another time, if Bill Gates lived in another time or Elon lived in another time, right? Like they wouldn’t be the billionaires they were today. There’s our whole life is pretty much dictated by luck.

Jed Sundwall: yeah. yeah. Actually. So I’ll one final, like bit of self-promotion that I’m allowed to do here is, our last, the last latest episode of texts on texts, my other podcast about literature, we talk about, a short story called anxiety is the dizziness of freedom, by Ted Chang. and which is basically like, it’s awesome. It’s very, it’s totally relevant to what you, what you just said, which is

Matt Hanson: Okay, cool.

Jed Sundwall: It describes a device where you can turn, you flip a switch and it creates a parallel universe that you can communicate with. So you can communicate with what’s called like a para self, like a parallel version of yourself. And it drives people crazy. Like it just causes all sorts of issues for people. Cause like there’s a guy who’s like, he’s like, my parallel self has a girlfriend and like, and I don’t. And I’m like, what’s wrong with me? Like, you know, it just cause yeah, it basically reveals to people that how much of their lives are

Matt Hanson: that.

Jed Sundwall: pretty much out of their control. Anyway, that’s, we’ve gone very far afield, but to bring it back to the point of stack and everything like that and how the stuff is created is like, you do just have to try to do this sort of stuff. I think it’s, you got to try. I think, I think what stack has demonstrated is that it is possible. And I do think that there are,

Matt Hanson: You gotta try.

Jed Sundwall: parts of this playbook that can be documented and repeated. But part of that includes like having, you said it before, building with the community and finding champions. And you have to do that on purpose. So.

Matt Hanson: Yeah, you do. Yeah. Yeah. And yeah, even just being engaged with the community, even if you are building stuff internally, I do feel like the more that you are engaged with the community, the better that thing is going to be. So if you’re working with stack, even if it’s like for internal use, come to the stack community meetings, you know, and let people know what, what you, what you’re up to and like, maybe you’ll get some good feedback. like,

Jed Sundwall: Yes.

Matt Hanson: It’s definitely, you’re gonna be better off, I think. You’re gonna be in a better position the more you work with a larger diverse group of people.

Jed Sundwall: Absolutely. Well, yeah, where should we point people? can send people to stackspec.org where you can learn everything you need to know. As far as getting involved in the community meetings, where do we point people to?

Matt Hanson: There’s a Google group that should be, is it on the webpage?

Jed Sundwall: I’m like looking around, I’m noticing the Stackspec site is directing people to our discourse, which we don’t support anymore. So.

Matt Hanson: Okay, yeah, so there’s some more things that we need to do. So yeah, we’re trying to clean, so the Stack Steering Committee, we actually, I think, have a meeting in the next week. Maybe it’s tomorrow. And we’re trying to clean up some of these things. So, yeah.

Jed Sundwall: All right, well, stay tuned then. Stackspec.org. Matt, you’re easy to connect with on LinkedIn and stuff like that. Maybe, I don’t know. You can join the Cloud Data Geospatial Forum. There’s plenty of people in our Slack, but you have to, we do ask people to pay to join that. It’s not a lot of money. But yeah, there lots of places to get involved, stay tuned and look at stackspec.org and see what you can find there.

Matt Hanson: yeah, yeah.

Jed Sundwall: All right, this has been awesome. Thanks, Matt, for coming on. I predict that we’ll have you on again, because we’ll be doing this forever. And thanks for everything you’ve done for the community.

Matt Hanson: Bye.

Matt Hanson: Yeah, no, thanks for doing this, Jed. Yeah, no, it’s been, this has been fun. I love the chat, so, you know, anytime.

Jed Sundwall: Any time. All right. Well, happy holidays. All right. Bye.

Matt Hanson: All right, you too. Bye bye.

Jed Sundwall: Okay, stay in