Great Data Products

░░░░░░░░░░░░░░░░░░░

A podcast about the ergonomics and craft of data. Brought to you by Source Cooperative. Subscribe ↓

Data Products

→ Episode 4: How Standards Emerge: Lessons from STAC


YouTube video thumbnail
Video also available on LinkedIn

Show notes

Jed talks with Matt Hanson from Element 84 about the SpatioTemporal Asset Catalog (STAC) specification and its role in making geospatial data findable and usable. Matt describes STAC as “a simple, developer-friendly way to describe geospatial data so that people can actually find it and use it.” The conversation covers how STAC emerged from a 2017 sprint in Boulder with 20 people and grew into a specification now adopted by NASA, USGS, and commercial satellite companies worldwide.

Matt promotes Howard Butler’s concept of “guerrilla standards” – a grassroots approach where stakeholders build something that serves everyone’s needs rather than making bespoke solutions. The central thesis: adoption is the only metric that matters. You can have the most elegant standard, but if nobody uses it, it’s not a success. STAC succeeded through community collaboration, simplicity of the core spec, an ecosystem of open source tooling, and timing—arriving just as cloud storage matured and satellite data exploded.

The conversation ranges into the limitations of remote sensing (“Remote sensing sucks,” Matt says, pointing to 20-30% error rates in land cover products), the future of purpose-built satellites, and why new data institutions are needed to validate emerging data products. Matt and Jed also discuss the credibility problem: launching a successful standard requires champions who have earned trust in the community. As Matt notes, “You have to earn credibility” – there’s no shortcut to building the relationships that make standards adoption possible.

Takeaways

  1. Adoption is the only metric that matters — An elegant standard nobody uses isn’t a success. A “crappy” standard everyone adopts improves lives and enables interoperability.
  2. Guerrilla standards work through buy-in — When people are part of the process, their needs get addressed and they become champions who use the standard internally.
  3. Simplicity drives adoption — STAC focused on meeting 80% of needs with a simple core spec rather than trying to cover every possibility.
  4. Timing matters — STAC arrived when cloud storage matured, COGs gained traction, and satellite companies were launching rapidly. The previous methods weren’t working.
  5. Credibility can’t be skipped — Standards efforts need champions with established reputations. Chris Holmes’s involvement and relationships were essential to STAC’s early traction.
  6. Remote sensing has real limitations — 20-30% disagreement between land cover products is common. The value of remote sensing is in relative differences and time series, not absolute measurements.

Transcript

(this is an auto-generated transcript and may contain errors)

Jed Sundwall: Welcome to Great Data Products. This is a live stream webinar podcast thing from Source Cooperative where we talk to data practitioners about their craft. We do this every month and you can visit us at greatdataproducts.com to see previous episodes and find links to subscribe on YouTube or wherever you get your podcasts. If you follow Source Cooperative on LinkedIn, we notify people about it there also. And then we also have a Luma calendar where

You can see that the next episode on great data products.com, but we actually have episodes scheduled out in January and February that you can see on Luma. I’ll talk about that in a minute. But today we’re joined by Matt Hanson from Elimin84 and a good old friend, I would say. And we’re going to talk about the spatial temporal asset catalogs specification. Matt, do you want to introduce yourself?

Matt Hanson: Yeah, thanks Jed. Really happy to be here. Thanks for inviting me. I’m Matt Hansen. I work at element 84 and I have been, I’ll give a brief background. I’ve been working in the remote sensing field for geez, close to 30 years now. And got into open source about 15 years ago and was in, I went to phosphor G and was instantly like this, this is, this is it. This is what I want to do.

I started contributing to GeoNode, was my first open source project that I contributed to. And then I started working on other projects and eventually got into stack and standards.

Jed Sundwall: All right.

Jed Sundwall: Nice. well, I can say we’ve been lucky to have you in the community for a long time. so, and, yeah, I mean, we’ve, and we’ve got a lot to talk about. You’ve done a lot, you’ve accomplished a lot. And, I would say your involvement in stack is a really secured your legacy. mean, among others, it’s a community effort, which is, you know, partially what we’re going to talk about here. So, you recently, boy, like let me, let’s actually back way up.

And can you, how do you describe stack to people? And with, with the caveat that like this podcast is not necessarily a geospatial podcast. we, we do want to reach more people, who don’t necessarily have expertise in geospatial. So how do you describe stack at a very high level?

Matt Hanson: Yeah, so Stack is, well, I describe Stack as a family of specifications as well as an open source ecosystem. And that’s maybe not a really layman’s way to describe it. So let’s talk, let’s say that it’s a simple developer friendly way to describe geospatial data so that people can actually find it and use it. That’s quick one sentence version.

Jed Sundwall: Okay. And then, okay, okay. And okay, so I’m going to play lay person. And I actually don’t even have to like pretend that much. Like I’m actually this naive in a lot of ways. I’ve heard Mark Corver, another esteemed colleague in this world, describe stack as solving the problem of listing objects in S3.

Of course, I can’t help but be very nerdy here, but like part of the problem that we’re facing and it’s not just in the geospatial community, it’s in many other domains is that we’re dealing with so much data that even just listing the files that you have is expensive. Like it takes time. And so you can imagine having a corpus of millions and millions of satellite images and you have to, you know, go through that haystack to find stuff. One way to characterize stack is that it helps it make it

Matt Hanson: Mm-hmm.

Jed Sundwall: Basically easier to index all that stuff to find what you want. Is that fair to say?

Matt Hanson: Yeah, I think that’s definitely fair to say. the tying it to S3, it’s not necessarily required, right? Like, stack could describe data files wherever. It doesn’t have to be in object storage. But no, I think that’s a good way to talk about it. When I give a stack presentation for new folks, like a stack 101, I often will talk about

about exactly this issue of the explosion of geospatial data. Like there’s been so much data and if you look at just like NASA’s holdings and their projected holdings over the next five years, we see so much data that if you don’t index the data, I had this saying that if your data is not indexed, it might as well not exist. Because if nobody can find the data and it’s just like, as you say, you’re just getting a listing of all the files, how can you actually find

the data that you want if there’s a billion files in object storage, let’s say. And that’s not far-fetched. That number is not all that far-fetched. Yeah. No, as I saying, if we look at Sentinel-2, right? If you look at the entire Sentinel-2 archive, there’s 25 million files. There’s 20 files for each. There’s 25 million scenes. And for each scene, there’s 20 files. So it starts adding up really

Jed Sundwall: No, yeah, not at all. go ahead.

Jed Sundwall: Okay. And then when you say easy, easy for who? like what, you know, stack stores its data in JSON. So who, who’s like the, the, typical user of stack, like what kind of software do they use? Like what kind of job title do they usually have? Yeah.

Matt Hanson: Yeah, geez, that’s a good question. I think that the ultimate data user is probably a data scientist. Like that’s, and that’s, think that’s who the original target was. When we first started looking at this, were primarily looking at public data sets because that’s what is available. And, you know, we, that’s what we were looking to index was NAEP and Landsat and Sentinel-2. And it was really a

data science user problem. And that was my background. That was where I come from, was working with scientists and working with different types of data and having to use different formats and different tooling just in order to find and access the data. And so I think that really was the primary user. We talk about it being developer friendly because of the open source ecosystem.

but like, and that’s really developers working in tandem with data scientists in order to leverage and use the data.

Jed Sundwall: Great. Yeah. mean, so I’m, I’m, leading you here a little bit in the, the, to the point being that like, think, I’ve, know, I’ve, I’ve worked in the open data space for my entire career, basically at this point. And so many conversations have revolved around like making data easy for anyone or something like that. And I argue that that hasn’t worked out super well. You actually need to find like, who are the actual practitioners that are going use the data and like, what, what will they be comfortable with?

or like what will actually help them rather than having a kind of nebulous like everyone thing. Yeah.

Matt Hanson: Yeah, yeah, it’s clearly not everyone, yeah, mean, we have had like journalists, like we’ve people have reached out to us from like New York times and like they’re creating stories and they want to access geospatial data. And so they’ve used some of the tooling around that. So that’s as close to a lay person. I think that, you know, we’ve really worked with.

journalists who want to tell a story and they just want to find data. They just want data from five years ago and today to look at a change over time and use it to write a story about it. And they were able to use the tooling, like PyStack client, even before that there was SatSearch, was an earlier tool set, and they were able to figure that out. But they were still leveraging developers to

to do that.

Jed Sundwall: Right. Well, but I think then there’s another clue here, which is that you have, we’ll go on this with journalists. You have an audience that typically has not been able to engage with imagery or like, you know, geospatial data. but they are, you know, we’ve watched this happen throughout our lives, like becoming more savvy and, more aware of the need to be able to like use software and data to tell stories and things like that. but they’re coming to us from like,

a completely different place than I think most geospatial data practitioners were in previously. so the key there, mean, you you, mentioned PyStack, you know, like they’re for whatever reason, you know, a lot of journalists use Python, you know, there are different communities that use different tools. Yeah.

Matt Hanson: Yeah, right. Sure, yeah. Language of data science, yeah.

Jed Sundwall: Yeah. Okay. We actually already have a question on, on YouTube from, who I’m just going to, I’m just going to call Sig. I’m not sure if that’s his name, his or her name. can’t tell. But asking stack is built around sharing data easily to anyone. Let’s say you want to use to share more secret data with access control, SSO encryption, et cetera. And different users that have different access to different data sets. have some thoughts on this, but like, as you mentioned, it stack doesn’t have to be explicitly tied to a

a cloud object store or a public bucket. Do you want to take that? I imagine you have some actual examples here. Yeah.

Matt Hanson: Yeah, so this question comes up a lot, right? Because out of the box, so I will get a little bit more technical here. what we use, so we have an API called EarthSearch that indexes public data sets on AWS. And that’s an implementation of Stack API. And for example, that one, that implementation has no authentication in it, because we were using it originally to index public data. And so.

we didn’t have need for controlling access and all the data was public and so hadn’t added that. And so we get that question a lot. And stack fast API is another implementation that didn’t have like really core built-in authentication at prime that it was first created. So there’s a couple of ways to do this. I’ll jump to the end first, which is that there’s a more

modern solution for this, it’s called Stack Auth Proxy that DevSeed has created. And that can be used to control access to individual items and collections based on attributes in the data. So that works pretty well. But what we’ve generally done is use it as a proxy. So you have your catalog and that’s open. Or it’s behind a firewall, but it’s like available to anyone who can access it.

Jed Sundwall: Okay.

Jed Sundwall: Interesting.

Matt Hanson: And then we have a proxy in front of that. That handles the authentication, queries the catalog, it knows what people can see, and then returns that result. So it’s going through the proxy. But these tend to be all one-off solutions that are created. so I think Stack Auth Proxy, if you haven’t seen that, that’s definitely something to look at that you can combine with Stack

with StackFast API or any potentially any stack API implementation.

Jed Sundwall: Okay. So yeah, I mean, I think, one thing I’ll like underscore here also is that like stack is a metadata spec. It doesn’t, it itself doesn’t say anything about authentication or anything like that. Like, so it’s, it’s, it’s been built to be very flexible, useful in all sorts of environments and extensible. I want to just stay in the weeds of stack a little bit longer. so the, the specification

Matt Hanson: That’s right.

Jed Sundwall: is made up of other specifications. So you have the idea of a, I’m going to go in order of like collection catalog and item. Can you walk through each of those and like what they encompass? Sure.

Matt Hanson: Yeah, sure thing. well, so we start up at the top. That’s a catalog. A catalog is really just a container. It’s a JSON. It contains really simple fields. Like you have a name, you got a title, you have a description. And then you have, most importantly, all of these entities within stack have links. And links are probably the most important part of stack, right? Because

Jed Sundwall: Yeah. okay.

Matt Hanson: we, when we got into this at the beginning, the ability to crawl a catalog was really important because that’s the way the internet works, right? Is by crawling things. And so, we wanted to be able to link, a whole catalog together and link down to items and link back up so that you could really visit any part, of data in this catalog and be able to crawl it one in both ways.

So catalog is the starting point in an API, especially the catalog is, that’s your landing page and it’s going to contain links and it will contain links to the collections underneath it. And each collection is really looks a lot like a catalog, a collection at one point, it even was a catalog, it was derived from a catalog.

Technically, that’s actually not the case anymore. It’s its own entity, but it looks a lot like a catalog. But collections are ways to group together items and data that is similar to each other. And so the most obvious case is when we look at the big public data sets, we see Sentinel-2 or Landsat and like Sentinel-2 level two data, that is a collection.

Right, it contains a bunch of items and that’s your next level down is an item. And an item is this where this is where we move from JSON to geo JSON because an item actually represents a specific location and a specific time or range of times. And that’s really where your data is. So you can think of it as a scene. You can think of it as it’s a footprint containing data. The data is contained.

in what are called assets. So that’s really the fourth entity type, except assets are actually embedded directly in the GeoJSON of items. So you have the catalog, collections, and then items. And so that’s the general hierarchy. And we have links that allow you to go all the way down from catalogs to items. Now, there is some nuances between a static catalog

Jed Sundwall: Right. Okay.

Matt Hanson: what we call a static catalog, which is really just a bunch of linked JSON files on disk or on blob in an object store. And that’s an important distinction between that and a dynamic catalog or what we call an API. And so there’s nuances because you can have, for instance, can have sub catalogs within a static catalog.

If that’s confusing or it might be a little confusing or not, but it’s a way to like partition the data basically. So you can, you can use sub catalogs to organize it. So you might have a collection and then underneath that look, we’ll have like a catalog for each continent. then you go onto the continent and then that’s, that’s where your items are. So it’s just a way to partition and organize the data in an API. This question comes up a lot, which is why I have the whole.

narrative around it here, but like in an API, you don’t need those sub catalogs because you don’t need to partition the data because you can search for the data on what continent it’s in or what path row it is if it’s gridded data or you can essentially partition on the fly anything you want. So that’s the important distinction between static catalogs and an API. We get the question a lot.

People have static catalogs and they ask, how can I search this? And you can’t really search it. You have to index it first. Like there’s that missing piece. But Stack originally, Chris Holmes really wanted us to focus on being able to have static catalogs because not everybody wants to stand up a server and incur the cost of that. And they just want to make data available and they want to share it with people.

And so the easiest way to do that is just have the metadata on disk. And it’s all linked to each other so you can crawl it and index it if you wanted to do that.

Jed Sundwall: That’s right. Yeah. I mean, I can speak to this. mean, I think, like this is a long time ago now, like when all this stuff happened. So when, it’s relevant actually to another, comment or question from SIG on YouTube asking, you know, did, so did the chicken or egg come first, IE the stack or the S three and the cloud optimized formats. I assume stack wouldn’t exist with only old files on disk. So, a lot, there’s a lot to respond to there. first I’ll say,

This is a fundamental issue about sort of the distinction between file storage and object storage that is like just not obvious to most people because they never have to think about it. Is that like, if you’re using a file system, like if you are, if you’re using a computer, like laptop or, you know, normal, normal computer with a GUI and stuff like that, you’re probably are interacting with the file system. You know, your computer needs to have an understanding of like, what are the files on your hard drive and has an index of them.

It also has an index of how the directories are nested and things like that. And you can search your computer for files and stuff like that. Otherwise, a lot of applications would be a huge pain to use if you didn’t have that index. Object storage like S3 has nothing like that. So object storage is just like you have a file, you give it a key name, and you put it in a cloud. And it’s there. If you know that key name, you can get it back out. And so this was…

this is the issue going back to the discussion before about like too much data. You can imagine a scenario where you have so many objects, you have so many files you’re dealing with that even the index of them would be too large for your laptop. Like just listing the names of the files would be too large for like a lot of people’s local storage. Like this is not a crazy idea, let alone like metadata about all those sorts of things. And so.

Matt Hanson: Hmm.

Jed Sundwall: Stack and a lot of sort of a cloud optimized approaches are an attempt at standardizing or finding patterns whereby we can break up all of this content into ways that are manageable. that has to do with things like stack catalog, as you described Matt, like with all these JSON files pointing the way. And also things like naming conventions for things and stuff like that, that like all add up to make that stuff work. The only other thing I’ll say is that, you when we brought

Lantz had onto AWS. The metadata that USGS would provide in its tar balls with the imagery was just this like weird text file that was like space delimited or something like that. Do you remember these? Yeah, the MTL files, right? And so we set up, didn’t, you I was just like, you know, I think it’d be better at least if this isn’t JSON. And so we, what we did is we created a process that was happened to the end of every image that we.

Matt Hanson: MTL. Yeah. Yeah.

Jed Sundwall: brought in and turned into a cog, as soon as it all landed in the bucket, we would run a Lambda function to take that MTL file and turn it into a JSON version of it. And I think that was kind of the kernel of like the sort of the first notion of doing something like this, where it’s like, you should be able to get to an image and you should have a reliable little machine readable, you know, or like easily parsable bit of metadata that you can find right by it.

Matt Hanson: Yeah.

Jed Sundwall: And then I guess then also just to close this off also with the understanding that yeah, there are a lot of people that are never going to run their own API. They can’t stand up a service and there are a lot of data products out there that do just need to land somewhere. And if somebody else wants to index them, they can. And I think the static stack catalogs make that easier, I would say.

Matt Hanson: Yeah, yeah, yeah, exactly. Yeah.

Jed Sundwall: Okay, so now let’s talk about the blog posts that you wrote, like the sort of the history of stack. Give us the high level overview. I we’ve, included the link to it in the, as we’ve promoted this and stuff like this, I’ll, I’ll, we’ll, I’ll have to put it back in the, in the chats and stuff like that, but it’s really good. But summarize it quick. It’s a comprehensive post, like, what’s tell, tell the story again.

Matt Hanson: Okay.

Matt Hanson: Okay. so yeah, it is, it is a bit lengthy. So yeah, so I did these two blog posts. the first one I wrote a couple of years ago and, I always meant to write a part two and, and two years passed. and then I’m like, you know what? I really, I’ve long been wanting to do it. had draft and various, conditions. So, finally I’m like, this is the time, you know,

Jed Sundwall: Yeah, you did it.

Matt Hanson: stack was just as we were publishing it stack was just accepted as a community standard for OGC so it’s like it seemed like a good time to actually publish it so the most recent post is called why stack was successful and it really looks at like like how on earth did this effort that started back in 2017

with with 20 people in a in a small room at the Marriott in Boulder like how did this turn into something that is now being adopted by commercial companies that are launching satellites as well as space agencies so NASA USGS for the Landsat program was definitely an early adopter that helped a lot so So I talk about this idea of guerrilla standards

And I gave a tip to Howard Butler on that, because I love the term guerrilla standards, because it really encapsulates what this process is and how it’s different than traditional standards work. And so that’s big part of it. And we could talk more about that, about the guerrilla standards. But it’s this grassroots approach where you get people that are interested, you get stakeholders that are interested in

doing something better and working within a community rather than making a bespoke thing on their own. And you build something that will serve everybody’s needs. And this is, I think this is critical because I’ll skip to the end a little bit again here and say that.

the conclusion of this is that there’s really, when we talk about standards, there’s really only one metric. Well, I say there’s three metrics that matter as a bit of a joke, which is adoption and adoption. And that’s true. you can have the most elegant standard that could exist. You could spend lots of time and make sure, and this covers every possibility and it’s very elegant and very nice.

Matt Hanson: But it doesn’t get used and so that’s not a success story at all You can have something that’s maybe a little crappy and If everybody uses it, it’s hard to argue that like the crappiness was a bad thing if everybody’s using it it’s improving everybody’s lives and it’s making interoperability easier and so The central thesis of

Jed Sundwall: Yeah. Yeah.

Matt Hanson: of the post was, that adoption is the only thing that matters. And then it exam, I examine like, like, how did we drive that adoption? Like how did we ultimately, like that’s the question, right? It’s like, it was successful because it’s been apparently adopted pretty widely. And so what was it that we did that drove that adoption? And part of that is the guerrilla standards approach of

getting stakeholders and getting champions and getting people excited about it and having buy-in from people. You know, that’s an important piece of this is that when people are part of a process, they’re more likely to use it. Their concerns and their needs are being listened to and they’re more likely to go back and champion it and use it internally for their own projects as well.

Jed Sundwall: Yeah.

Matt Hanson: And then another aspect is the simplicity of it. The core, the core spec. You know, this wasn’t about trying to make a standard for everybody and everything. This was about creating a spec that was going to meet 80 % of the needs to, and really focus on what those needs were. Like how do we find data? How do we have consistent metadata across?

different providers. How do we have something really simple and how do we encourage people to use it? We encourage people to use it by creating an ecosystem of tooling so that there’s a low barrier to entry. so the ecosystem is part of the guerrilla standards approach is that you need to start building implementations. And at that first sprint back in Boulder at the end of the day, thanks to Rob Emanuel and Seth Fitzsimmons,

Jed Sundwall: Yeah.

Matt Hanson: We had a server working at the end of one day that was serving up NAIP data. I don’t think we went back to it. It doesn’t really resemble much of what stack looks like today, but that wasn’t the point. The point was that we got some ideas together, we stood it up and it worked and then we could continue to iterate on it. So let’s see what other…

aspect of the post that I feel like I should call out. The community collaboration is critical, like having in-person sprints that are open for anybody to join. That is key as well. And I would be remiss if I didn’t mention the timing. The timing, I think this was just serendipity perhaps.

But the timing of stack was critical to its success. We were at a point where the public clouds were maturing to a point where like geospatial, we were starting to see more geospatial data on it. you you were just talking about your effort on bringing the Landsat to AWS. Cogs were really starting to gain traction. There was lots of launches and explosion of

private companies launching satellites. so there was just this real, there was a real need there. Like the previous methods weren’t working. And so there was a real need in this and no one else was really solving that. And so it just filled the missing layer at exactly the right time.

Jed Sundwall: Yeah.

Jed Sundwall: Yeah. Yeah, no, it’s it’s great. mean, it’s,

such a fascinating example of like, of really what we’re actually trying to do with this live stream webinar podcast thing, which is like, we know some things have worked. Like we need to understand like, why did they work? Like what made the difference? And like, it’s so easy to look back at, I mean, it’s very easy to look back at failed attempts at foisting standards on the world, you know, so many standards that have not been adopted at all.

Right. despite, despite all the good intentions and the need and things like that. And so it is, it feels mysterious why stack was successful, but I think your post and everything you just said makes, you know, makes it’s not a mystery here. Like I think we can probably look back at things that, made it successful. And, it’s actually kind of interesting timing. got another comment on YouTube from, I don’t know this username is bent quarter.

So bent quarter.

Who knows? But asking, is there a GUI for building a stack? Which is super interesting question because everything you’re talking about, you know, you talk about like, you know, we got all these people together and it was easy for them. And you know, we, we had a server running by the end of the day. It’s like the people that we’re talking about are data practitioners. It’s a pretty esoteric, like cool kids club that, know, these sprints, they’re not huge. It’s a, it’s a, it’s a small group of people who really have practical experience and needs.

Jed Sundwall: that they understand each other, which has allowed that, I’d say like allowed you to gain traction really, really quickly. But yeah, we are at the point, I think like this question, like is there a GUI for creating a stack? Like that’s an interesting question. Like certainly wasn’t the priority, but where are we now?

Matt Hanson: yeah, it is an interesting question. So the answer is no. Like there, there is, right. there’s, there’s interfaces for browsing catalogs. there’s stack browser. we have a user interface that we stand up for earth search, called film drop UI that, is, interface for stack API. There’s others out there as well. Microsoft planetary computer, has the user interface, but yeah, these are all kind of focused on.

on being able to search and browse existing APIs, not actually creating your own. And I think that’s just because, those are different user bases. Like the people building the stack metadata, are generally developers. and you have a bunch of data and you need to pro you want, you want to generally programmatically create the

the metadata from it. So like extracting the footprint of it or pulling metadata fields that are important from the original metadata or from the headers from the data file. So that really is done in a programmatic way. I think someone might have created a user interface for creating collections.

It would just be a form field where you can go in and fill things out. But it’s not a bad idea either, like having some sort of user interface to make this easier. But I think it would have to be combined with some back end that is where maybe you’re dragging and dropping a series of files. And then it’s going to try and fill stuff in, but then gives the user an option to be able to add in additional details.

and then extend that to ingesting a bunch of other scenes from it. Like maybe there’s something there that actually could be useful and make it easier for users to make their own. There’s some tooling for the CLI for creating stack. Like there’s Rio Stack, which can be used to create a bare bones stack item from cogs. But yeah, no one’s.

Jed Sundwall: Yeah.

Matt Hanson: really brought up a GUI for building a stack.

Jed Sundwall: Yeah, that’s an interesting question. but it, also gets at, think like sort of the challenge that I think, it’s a huge challenge. It’s a challenge that like a lot of government executives need to understand a lot of people working in policy, people working on workforce development, people educating, future leaders and data scientists is that like the volume of data that we’re working with is so large that like,

the notion of distilling or creating tools that are really designed for like humans to like click and drag and point at things and track with your eyes, like to do stuff. That’s not how it’s going to be done. It’s yeah. Yeah.

Matt Hanson: Right, right. It has to be programmed. And so that’s why I said, you know, like a GUI that allows you to maybe set that up, right? Like set up the programmatic creation of it. Like that might be useful, but you’re right. Like you’re not gonna, you’re not gonna manually create a stack for every scene, you know, for every, for every item or, you know, image.

Jed Sundwall: Yeah.

Jed Sundwall: No. Yeah. Yeah. I mean, and that’s like, this is not to, you know, to dismiss the idea again, like it should there be a GUI or like why, you know, this is, it still remains an interesting question, but it, I think it reveals the fact that like stack emerged because we were dealing with, suddenly found ourselves dealing with so much data that there was, it required a sort of purely programmatic approach at first.

Matt Hanson: Yeah. And those were the first, like those were the first use cases too, right? Like, was, was lands. It was these big archives. It was the Landsat and Sentinel was NAEP. and it wasn’t, like small amounts of like commercial imagery because we didn’t have access to those. you know, like that was money. So, this was the primary use case was how can we make it easier for users to access public data sets?

Jed Sundwall: Yeah. Yeah.

Jed Sundwall: Right. I’m imagining now like a use case where like an entirely local use case where it’s like, okay, as I mentioned to Matt before we started streaming, there’s a mudslide in my neighborhood in Ballard. I don’t know, I don’t know any details about it. I hope no one’s heard or anything like that, but literally right now there’s a mudslide in my neighborhood, but you could imagine somebody going out there like with a laptop, a drone, flying some imagery, producing a relatively small product.

and wanting to package that up in a nice tidy stack catalog that they can then get out somehow. And that’s like kind of like, I could see that as being a very sort of like lay person, not touching the cloud kind of Dropbox scale type thing, you know, that you could do. And that’s a maybe use cases like an emergency response type thing for something like this.

Matt Hanson: Yeah, for sure. Yeah, that would be. And, you know, some people I think have created stack catalogs for small data sets like that. But then that raises the next question, which is, how do people find the catalogs?

Jed Sundwall: Well, I I want Source Cooperative to be a place where people find these things. So brought to you by Source Cooperative. This is our podcast, so I get to do stuff like that. Well, thank you, thank you. Well, thank you. Yeah, actually, let me, I’ll do, that’s, that is actually a prompt to do what I said I was going to do. We’re going to do housekeeping really quickly. And just because we know that some people have joined Midstream.

Matt Hanson: Right, so there’s, yeah.

Thank

Yeah, you get that. Yeah, that was a lead-in for you to plug it.

Jed Sundwall: So this is Great Data Products. It is a live stream webinar podcast thing brought to you by Source Cooperative, which is a data publishing utility that we manage. You can go to source.coop to learn about it. But this is the time where we talk to data practitioners about their craft. And this month we’re talking to Matt Hanson about the Spatial Temporal Asset Catalogs or STAC metadata specification, which has been wildly successful.

And then to, I’ll, do a little bit more self-promotion on this. we wrote, there’s, there’s, is great data products, the live stream webinar podcast thing. we also wrote a blog post or publish a blog post, a little bit ago called great data products that has, I it’s done pretty well. you can go to radiant earth at radiant.earth slash great and read that. and I’m just, but I’m going to share something.

Let’s see, I don’t know if I can do this. Can I share my screen? Yeah, I’m gonna share a window in response to, again, back to the question about GUIs. And so this is a drum I’ve been beating for a really long time. This is a graph I’ve been talking about for forever, many years, but it’s been enshrined in this blog post. I’m gonna like…

Expound on this in a future post, but like it’s.

it’s useful to understand or sort of to think about how do you maximize the usability of data and like why a programmatically accessible approach is so important. So if you have raw data off of a sensor, it is not going to be that useful to that many people. Like there’s a cost, just like an inherent cost required to like extract any sort of value from it. And satellite imagery is sort of like notoriously difficult here.

Jed Sundwall: which we can talk about all the reasons why that is. so, but what often gets funded is like, I want a thing that’s going to track mudslide risk, you know, for example, in the Pacific Northwest, right? And so you can spend a lot of money sorting through the data, processing it, creating an interface, you know, doing user testing to create a tool that helps you understand flood, you know, mudslide risk in the Pacific Northwest.

You’ve gone over this huge arc where you, you spend a ton of money, but then the potential value of the data is then diminished again. Right. And so this is always kind of like my warning against focusing on, on, on GUIs or dashboards and stuff like that is that by creating an interface like this, you’re making a ton of decisions about like what the value of the data is. And like, instead what we should be trying to do is like,

How do we maximize the query ability of the data? And then sort of like, it’s this, again, I call this the sweet spot graph. We have to find this place where it’s like, we’re taking out a lot of the annoying, undifferentiated heavy lifting required to like get the data in a way that’s queryable without over determining it. so, um, anyway, I’m preaching to the choir with you, Matt, but I just,

Matt Hanson: Yeah, no, you know what a great example of that is too, is Landsat. Let’s take look at Landsat. There are two processing streams that Landsat does. They have an ARD process, is in one projection. It’s actually in an alberts projection. There’s five different albers, maybe seven albers projections, depending on the continent and the place of the earth.

Jed Sundwall: Peace.

Jed Sundwall: Yeah.

Jed Sundwall: What’s your favorite? Sorry, I’m just kidding. Yeah.

Matt Hanson: Favorite favorite continent favorite Alvarez projection I don’t know

Jed Sundwall: I’m sorry, just go on. I’m being, I’m trolling you. Sorry.

Matt Hanson: so there’s the, there’s the ARD stream and like, that’s distributed as these, as these ARD tiles. And then there’s the regular stream of data, which, which delivers UTM, tiles. So the question is like, why, why these two different things, right? and the reason why is because people like, and are used to UTM because it’s a nice pretty picture, but it introduces more errors than the

than the Albers projection does. The Albers projection minimizes the distortion errors from the original raw data. And so I have this thing that I like to use, which is as soon as you pick a projection, it’s the wrong one. And so this is in the graph because it’s like, you know, wanna maximize value, right? Then you should try and avoid making assumptions about how people are gonna use that data.

Jed Sundwall: Right. Yeah. Yeah.

Matt Hanson: Projection is a perfect example. Rather than picking a projection that you think is going to be useful for everybody, just what’s the one that’s going to minimize the potential errors? Because you know that people are going to reproject it. MODIS does this great. MODIS does a sinusoidal projection, which is the best projection for minimizing distortions due to the orbit of the craft. Everybody hates it because it doesn’t make for very pretty pictures if you open it up and just look directly in QGIS.

Like it looks all wonky, but it really is. It really is the best choice for that space.

Jed Sundwall: Interesting.

Jed Sundwall: Fascinating. Oh, wow. Okay. You know your stuff. Yeah. No, no, it’s great though. We have a pretty interesting question about this though. Like on this note of like, what is the right way to present data from the great Max Lenorman, who I’ll just embarrass him a little bit more. Like there’s no way we’d even be having this podcast if it wasn’t for.

Matt Hanson: I know a couple things and I just keep on reusing the same stuff.

Jed Sundwall: Minds Behind Maps and the approach that he took with that. So he asked, does this still hold in a world where it’s so much easier to make custom dashboards, GUIs, front ends with AI? I have a response to that, but I’m curious to hear what you think. I mean, especially about, have, element84 does so much great work producing really interesting tools. Yeah, what are your thoughts on this?

Matt Hanson: Well, I like, I like you guys. They’re pretty, you know, but they’re also pretty impractical. Aren’t they? Like if we look at the data, 99.9, 9 % of the data out there, right? No one’s ever going to look at.

and so I do think we spend an inordinate amount of time focusing on visualizing remote sensing data when that’s actually not really a great use case. I, the demos and outside of pretty pictures, maybe, you know, journalists like that, if you’re telling a story. and so, you know, it’s great that it’s easy to make custom dashboards.

You know, and I, I’ve been working on some UI stuff recently and it’s fun, but, yeah, I think from a practical standpoint, we need to be focusing more on unlocking the value and the data with, you know, with programmatic backends.

I don’t know if that really answers the question now.

Jed Sundwall: Well, yeah, I I can, I think, a, I agree with you. mean, I think you wise are maybe the wrong thing to be thinking about. I would, so I agree with you on that in the sense that like, I use this example all the time. I may have already mentioned it on this podcast. I probably will again in the future of like so many like attempts at making earth observation data useful for like agriculture, you know, especially in like low and middle income countries where it’s like, no, it’s great. We’re going to give the farmers an app and then they’ll know what to do. And I’m like, no one’s going to use your app. Like.

You’re not a the farmer’s not going to install your app. They’re not going to open it. It’s not going to become a part of their life. Like it’s possible. Like that does. There are sticky technologies that people do, you know, become part of people’s lives. But like it is so expensive to make that happen. And it’s so rare for it to actually happen. My theoretical hypothetical like Earth observation application for the farmer in a poor country is suddenly they have.

Matt Hanson: you

Jed Sundwall: they can get insurance for some reason. They don’t know why, but like there’s a flyer for them to get insurance or a salesperson comes and visits them and is like, hey, we can actually sell you affordable insurance now. The basis of that insurance product is Earth observation data that allows for that insurance product to exist. It is a product of data, but the farmer doesn’t have to know anything about that. The value of it gets.

built into the price of the insurance and like that’s how the value is delivered. Is there a GUI or some sort of UI to the data between the receipt of the data and the creation of that insurance product? Like maybe, maybe not, but like I think increasingly, so partially to answer Max’s question, like in the age of AI, and I know Matt, you’ve said stuff about this before, like it’s just gonna be a model doing all the analysis, you know, and

Matt Hanson: Mm-hmm.

Jed Sundwall: And what’s derived out of it is going to be like some sort of index or figure that gets put into a spreadsheet or database or, you know, inform some other process. Yeah. Yeah. Yes. Which by the way, you can preview CSVs on source cooperative now, which is amazing. Yeah. Go.

Matt Hanson: That’s right. It’s tabular data. The future is tabular data. Yeah.

Matt Hanson: Nice, that’s great. So, all right, so this is a bit of a tangent, but like I feel like it’s maybe a good time to say this. And I used to give a presentation and I talk about this a little bit, but, and I don’t know, this is gonna seem like a tangent, but.

Jed Sundwall: Go for it. That’s why we’re here.

Matt Hanson: You talked about the farmer, you know, and, getting the app and, know, there’s another reason why that doesn’t really work. And it’s because remote sensing sucks. All right. RSS remote sensing sucks. And what I mean by that is that like, you know, I mean, I’ve been in this space for a while, right. And, and if you look at old research papers, new research papers, and like, take a look at land cover products, for instance.

You can get land cover products from different producers for the same year using the same data. And like, they might be 20, 30 % off, 20, 30 % disagreement with each other. Because there’s a lot of stuff that goes into the image that’s formed. The entire big equation, the radiative transfer equation for, for

for how that light propagates and gets the image means a lot of variability. And when we talk about level two data, we have atmospheric correction, which also includes a tremendous amount of variability. so I have this issue with the ag community because I feel like, and lots of other, I think industries have done this as well, where they’ve over-promised.

and they’ve under-developed what remote sensing can do. you know, 20, 30 % errors are not uncommon. But if you go to an engineer that is doing space exploration, right, or any other engineering discipline, you’re like, oh, 30 % errors are normal. They’re gonna laugh at you, right? We didn’t send people to the moon with 30 % errors, right? Like, you’re gonna miss the moon. So I think there’s an aspect here of like,

Jed Sundwall: Yeah.

Jed Sundwall: Right.

Okay.

Matt Hanson: having realistic expectations around what remote sensing is capable of. And traditionally, back before Landsat was available on S3, the people doing that work were scientists. And so I don’t think it really, it didn’t really come up that people were like misusing remote sensing data in a bad way.

But once that data became available to the masses, and this kind of ties in some of stuff you were saying before, everybody started using this data. lot, like companies were like startup companies were starting to leverage this to generate NDVI. I remember working with one company back using that Landsat data, Calculate NDVI. And the problem was, and I think I’ve told you this before, Jed, is that that data was not appropriate for doing that, right? Like the original Landsat data that was on

AWS was level one data. It wasn’t even level one, top of the atmosphere data. was like top of the atmosphere prime. So it wasn’t even accounted for angles. so, and so that just like, I think that ended up causing more of the same problem. Like people continually, you know, being over-promised what remote sensing can do. So that’s my, that’s my issue with, with the Ag community is that I, I,

I feel like they’ve over-promised what is capable, what it’s capable of. Remote sensing is very powerful because I might not be able to measure that water quality in a lake very well, you know, within some air, but I can look at every lake in the world, right? Every day. And what it’s really, really good at is looking at relative differences. So time series. So time series.

Jed Sundwall: Yeah.

Jed Sundwall: Yeah. Yeah.

Matt Hanson: is where remote sensing really shines, being able to look at change over time and differences. And then this leads into a whole other segue of this is why most commercial satellite data providers have bad business models.

Jed Sundwall: Yeah.

Jed Sundwall: Okay. We should keep going down this path. think.

Matt Hanson: no, it’s that like they’re, focused on this idea of selling imagery, right? Like scene by scene and, and like, and, and there’s really limited use of that. maybe for photogrammatists, like, you know, looking at it, like that’s how we originally use it. We have a high resolution image and we’re going to look at it and identify things. But the real value in all of these archives of data is that is, is the time dimension.

Jed Sundwall: Right. That’s right. Yeah.

Jed Sundwall: That’s right.

Matt Hanson: And so I don’t know, I hope for a future where those archives are maybe unlocked. Maybe there’s a subscription model where you can access the whole thing, the whole entire archive. But like this whole piece, me and like by image by image just seems, it seems a little ridiculous.

Jed Sundwall: Yeah.

Jed Sundwall: Yeah.

Jed Sundwall: Absolutely. Okay. mean, yeah, this is sorry. Yeah, yeah, yeah. I mean, we’re tipping into the philosophical, which is great. That’s we get to do this is like, I like to say like, imagery is a metaphor for the data. Like, it’s like, yeah, like imagery is like one way to see the data because you want to see it, right? Like, I went through this a bunch at when I was at AWS, you know, building the open data program is that I have had, I’d have executives that are like, where do I see the pictures?

Matt Hanson: Alright, there’s a bunch of things there.

Jed Sundwall: Like, what will it look like? And I’m like, well, do you know what an S3 bucket looks like? you know, it’s just like, it’s a bunch of objects, you know, with names, like it doesn’t look like much. We had the same issue when we started hosting Hubble Space Telescope data, where people are like, I want to see pictures of like the of the galaxies and stuff. And I’m like, yeah, that would be cool. Like, that’s not what’s in here. Like, this is like set. This is telescope data in a weird format called fits that has its own, you know,

Matt Hanson: yeah. Right.

Jed Sundwall: great wonderful people trying to figure out how to make how to cloud optimize it but it’s like the imagery is a derived product that’s made for a human to look at with human eyes that’s just one tiny sliver like one tiny slice of like how this data can be interpreted or used so yeah i will i feel like i i do feel like i want to defend myself with the lanset stuff i’ll first of all just say like

Matt Hanson: Yeah.

Jed Sundwall: I didn’t know what I was doing. Like, I was just like, well, look, we’re going to bring the Landsat data on AWS. I, I had some ideas, uh, bandied that I bandied about with Peter Becker from Esri and, um, Frank Warmerdam at Planet, you know, specifically like I consider them like the two people that were like, you should do this internal tiling and overview thing that ultimately became the, you know, known as the cog. and

Matt Hanson: Yeah.

Jed Sundwall: That was it. But I was just like, well, we’ll just see what happens. but I’m, I guess my question though is like, is that a solvable problem? Like is any data fit for, you know, safe for public use and distribution?

Matt Hanson: Probably not. mean, right? every data can always be misused. So, and don’t get me wrong, right? Like that move of Landsat to the cloud was huge. It was really popularized Landsat. We wouldn’t be where we were today if that data set, that very important data set wasn’t there. But the time was that…

Jed Sundwall: Yeah, I don’t think so. Yeah.

Jed Sundwall: Thank you.

Matt Hanson: like that data wasn’t available really. Well, it was available to people, but like that’s not who was using it. Right. It was, it was scientists and it required that anybody using it probably should have opened up the Landsat data user handbook and read like what the data was and what needed to be done for it in order to do things like compare NDVI over two different days. cause you couldn’t do that.

but people did it anyway. And so, but like, I could point to other data. I’m sure that that’s, you know, that happens all over the place. education, right. It’s a good thing relying on experts. Like, you know, these are things that, that companies need to do is value that expertise in the geospatial and remote sensing domains. and not just assume that because data is easily accessible.

Jed Sundwall: Yeah. Right.

Matt Hanson: and you can just easily find it that like you can do things without really knowing what you’re doing.

Jed Sundwall: Right, right. Well, yeah, I I think I would just, I advocate for sort of permanent constant vigilance and skepticism around everything. mean, the history of the internet so far, you know, which was designed explicitly to like improve the sharing of like research data, you know, I mean, that was Tim Berners-Lee’s like goal was like, I want to be able to share stuff with my colleagues more easily. We’re sort of epistemologically like,

It’s very hard to say whether or not we’re better off because yes, there’s a lot more information out there. I would assume a lot of it is accurate and great and pristine in a lot of ways, but like there’s really never anything, never anything stopping anybody from twisting it, interpreting it, turning it into a narrative that, you know, fits whatever their, their agenda is. Let me go to the comments again. Sig asked about WMS and how it made easy to get

Matt Hanson: Yeah.

Jed Sundwall: Many large raster image, well, I’ll just put it on the stream here. To get imagery into legacy desktop and web apps might stack be implemented in a similar fashion. It has been. mean, Esri supported stack for a super long time. Do you have comments on that?

Matt Hanson: Yeah, Yeah, stack. There’s a new QGIS feature. There’s a stack plugin that actually works really fantastic. So yeah, think that that’s already happening.

Jed Sundwall: It’s yeah, it is happening. And then from CJ Levinson, I’m curious to hear how this conversation extends to model data sets as opposed to remote sense data and how this relates to my, to my point, Jed’s point of good data products being about making less decisions. So yeah, thinking about climate models, weather models, mostly modeling outputs, which would be the main geospatial artifacts. So yeah, I mean, element84 has done some great thinking on

embeddings data products and things like that. think that’s relevant here. What’s your thought on this, man?

Matt Hanson: Yeah, well, there’s a couple of aspects here, Well, there’s the aspect of how these model data sets, like these generally large homogeneous model data sets, fit in the stack. But I’m not sure that’s the question. Is that the question?

Jed Sundwall: No, yeah, less about stack, just more, think about how we’re talking about like, you know, data that’s fit to be shared and fit to be used. And, you know, now we’re dealing with like data products that are, are just model outputs. So like a model’s done a bunch of magic on them.

Matt Hanson: Right.

Matt Hanson: Yeah. So I think that gets into your curve, right? Which is like, you know, we’re, we’re in the curve is that modeled output, but like, generally speaking, I think that, like that’s generally what we want, right? Like, this is what users want is they want the modeled output. They, don’t want level two Landsat data. they don’t even want level three. want, you know, what they want is they want planet variables, like planet lab variables. Dataset is exactly the type of thing.

that we need to see more of, think, where this isn’t imagery, this isn’t time series, this is like, I’m looking for a particular type of data variable, and I can get that, and it’s been derived from imagery, but it’s gone through a process that weeds out all those edge cases and everything. I think planetary variables are great.

That’s a great data product right there.

Jed Sundwall: Yeah. I would also say, so this is a shout out for, um, the time to shout out dynamical, um, another, so the, the dynamical podcast, which is called weathering, which is just a, an absolute delight. Um, this is from the people who build upstream tech. Um, but anyway, they’ve, they’ve, have this great podcast where they have, they’ll actually read papers on, you know, weather forecasting and, and, um, advances in weather forecasting. And in a recent episode, if I, uh,

Let me see if I can remember which one it was, but it was, I think it’s the one.

on, yeah, that’s the most recent one, a taxonomy of bias, sense-making, heretical physics and the Tom Hanks, Bill Murray multiverse. It’s a good episode. But where I think they discuss how like, you know, we already interact with a lot of models and develop opinions of them over time based on their usefulness. Right? So like you were saying before, like a lot of satellite imagery has these like insane, you know, error rates or whatever. They just have like,

substantial error rates, right? They still might be useful. know, there’s this, you know, the adage that like all models are wrong, but some are useful. And so, yeah, I mean, I would say, I guess I would just, I’m just going to agree with you to say like, this is what we want, are to have models that are able to distill data into things like planetary variables or like basically things that can support decision-making. And I think people aren’t idiots.

You know, like they’ll figure out like, is this useful to me or not? And, and it’s possible that sometimes the model gives you something that’s like catastrophically bad and like you lose money on it. And you’ll, you’ll be able to make a decision whether or not you want to trust that model again. You know, it’s, it’s the way the world works. Like, I think it’s so easy to think about, like, or just it’s so easy to like, like over, overthink this sort of stuff. you know,

Matt Hanson: Right.

Matt Hanson: Mm-hmm. Mm-hmm.

Thank

Jed Sundwall: man, I’ve missed out on LinkedIn. People have been saying stuff.

Matt Hanson: Uh-huh. So while you, okay, before you do that, have I told you about my, my Star Trek theory of, of remote sensing? Have I ever, okay. Well, we’re on a podcast, so I’ll have to now explain it anyway. Even if I, even if you had said, yes, I’ve heard this before. so if we look at Star Trek, right, like my whole vision of, of the future, I hope is, is more, is way, is way more Star Trek.

Jed Sundwall: Yeah. Go. No, no, no. Go for it.

Jed Sundwall: I love this. I remind me.

Jed Sundwall: Yeah, yeah.

Matt Hanson: then a more dystopian version, but, in Star Trek, you have tri quarters, right? And you have sensors and, and, and what are those sensors not do they’re not sending back images that are then analyzed, right? You’re, scanning for life. You’re scanning for a particular element. You’re scanning for specific variables. And I think that maybe there’s an aspect here. Like we, we creating general purpose satellites.

historically Landsat, right? It’s like, well, we don’t really know this could be used for a bunch of different things, but we’re increasingly, I think, seeing companies that are coming up and, creating satellites for particular specific verticals. selling the satellites and satellite as a service. And I think ultimately maybe that’s where remote sensing goes, where there isn’t a satellite that’s like taking an image and then we’re down linking that and then like.

figuring out a bunch of different use cases and using it for a bunch of different use cases, but rather it’s like, no, this is like, see this with GHG set, right? It’s like, no, this is a satellite for detecting methane. Like it’s a single purpose thing. It’s the Star Trek. It’s like scan for life. It’s like it might, that might actually be an optical satellite or it’s a SAR or something like that, but it’s doing something and doing something on board and then just sending back just the thing.

Jed Sundwall: Yeah.

Jed Sundwall: Right.

Jed Sundwall: Yeah, okay, sorry, now I’ve got it, this is great.

Matt Hanson: Thank

Jed Sundwall: Go Star Trek. It’s funny, I’m not a Trekkie by any means. I did watch the Next Generation a bit when I was a kid, really liked it. But I brought Star Trek up at some recent open data event that I was at. just being like, because people are like, are there any examples of like literature or stories about like the future of like technology where like things are good? And I’m like, I think Star Trek is like one of those, you know? Yeah.

Matt Hanson: Yeah.

Matt Hanson: yeah, yeah, it’s, yeah.

Jed Sundwall: Cause we are, we’re so like, we’re just so steeped and we have been for many years into kind of like dystopian technological stories and stuff like that. And I think we should keep Star Trek in mind as like a vision of where we could take things. you reminded me though. I was, so last week, a bunch of our friends were at a national academies of science workshop on earth observation and the future of data stewardship. And I pitched basically what you just said in a way. I mean,

Matt Hanson: Yeah, absolutely.

Jed Sundwall: We worked within groups to come up with a 20 year strategy. And I had some license to kind of steer the Ouija board, as I would say. We’re all hacking on these ideas, but this really wasn’t my idea because it really did come out of the group. was just sort of this realization that I think we know a few things that we want to accomplish in terms of governance and let’s say environmental management or something like that.

Matt Hanson: Right.

Jed Sundwall: And rather than looking at the next 20 years of Earth observations and thinking like, well, what sensors do we need? You know, and what file format should they be in? You know, what should the standards be and like, who should pay for it? And it’s like, but what I, I led with when I was sort of reading out from the group, like, I think if we’re thinking 20 years ahead, we should assume there will be more sensors. There are going be more data products. There are going be more models producing all sorts of stuff, more users doing weird things that we could have never anticipated. And what we should probably do.

Matt Hanson: .

Jed Sundwall: And I cannot emphasize how hard this was for me to say out loud. We should maybe look at something like the sustainable development goals. I like to make fun of the sustainable development goals because it’s just like kind of a bunch of hot air in terms of like, it’s like, that’s nice that you created these goals, but like, really? Like, is anybody going to do anything about this? And, but the truth is like, well, we, but we should, you know, so like one is like, we should like, it’s, it’s just like,

Matt Hanson: No.

Matt Hanson: Yeah, we should. Yeah.

Jed Sundwall: I make fun of them. Sorry, everybody. But like, it’s just like the UN doesn’t really have the ability to herd the cats that are nation states to get them to do stuff, right? I think we’ve proved this has been demonstrated. so, but the sustainable development goals are like really good goals. So it’s like, hey, you know, we really want to ensure that every one of the world has access to clean drinking water. And going back to your point, what do we need to do that? And it’s like, it could be,

Matt Hanson: Yeah. Yeah.

Jed Sundwall: any number of different types of sensors and we should have some sort of entity that is actually held accountable to like making the end result happen. And who knows what kind of sensors they’re going to use. You know, we don’t need to say that like, I mean, it might come into the case like we need something like GHDSAT, you know, and the community that’s like driving at that specific goal can determine that.

Matt Hanson: I know.

Matt Hanson: Yeah. And they’ll need dedicated satellites to do that. Right? Like this whole shared, the whole shared satellites for all these different use cases. Like there’s just not enough tasking capacity. and power is in time series and you’re like maybe lucky to get an image like every other month. Like you really need, you really need a dedicated satellite for, for, for the purpose, I think.

Jed Sundwall: Hmm.

Jed Sundwall: Interesting.

Okay, I don’t have strong opinions about this. I’ve kind of always like thought, you know, there’s likely latent capacity in the satellites that we do have up that people aren’t, you know, just people can’t get access to, right? So like huge fan of common space, for example, you know, like could, well, it’s an example worth debating. I mean, you know, we were fiscal sponsors of common space, know,

Matt Hanson: Yeah, I mean there might be, but yeah, commonplace, right? This is a great example.

Jed Sundwall: Bill was listening in here, a glorious initiative. But there’s still, think there’s still like plenty of debate to be had, which is like, does common space need its own satellite? Or like, is there actually just like a legal financial policy hack that could make existing sensors, you know, actually useful for the humanitarian realm? It might be easier just to launch your own satellite at this point.

which is why I’m glad they’re trying to do it. But it’s, I think it’s a worthwhile debate.

Matt Hanson: Yeah, think, yeah, my sense is that it is. And especially if you want, if you want full control over it and you want to, if you want to revisit the same areas over and over again, like even for a disaster, right? Like, we focus on, on

imagery after there’s some disaster, like ideally you’d want to continue to look at that same area for some months afterwards to see about the recovery efforts or like there’s flooding, like how long does that take for the flood waters to recede? And so I just don’t see how you could get that much data unless you’re actually controlling the satellite and the ability to look at the same areas over and over again. Same thing with like infrastructure, right?

like companies that own and operate global infrastructure. Like, yeah, it totally makes sense for them to just own their own satellites. And like these things are pointing at the exact same areas day after day.

Jed Sundwall: Yeah. Huh. I wonder this Munich re is there is Munich re going to fly its own satellites soon? You know, it seems like, yeah.

Matt Hanson: I mean, it’s getting more and more cost effective, right? I mean, we’re seeing companies like pivot towards, you know what? We’re not actually going to sell pixels anymore. We’re going to build satellites. And, I think big companies, there’s lots of countries in the world too. Like this is, this seems like this is where the business is heading is smaller, cheaper purpose built satellites.

Jed Sundwall: Yeah. Yeah. This is all right. We’re in agreement. mean, this is again, what I was saying at this national academy of science thing last week. was like, I mean, I’ll say like, there were plenty of people that are like, oh no, only the government can do this. You know, everyone knows that. And I’m like, I don’t think that’s true. I think we’re going to see more satellites being flown by more actors. Linda’s chiming in on LinkedIn saying she agrees with the need for dedicated satellites, you know, purpose built. then yeah, bill’s open for the debate, but yeah, I think there’s a.

Matt Hanson: Yeah.

Matt Hanson: Nice.

Jed Sundwall: I’m, I find this compelling. want to get to, so Tim Bailey asked earlier about, the, the issue about human. He said there’s an issue about human inspection to validate interpretation. he says, I work in the forest wildfire resilience field where there’s a stampede of new data products that are not great data products. so yeah, I mean, we’re going back to the error rate issue and like, kind of like the issue of

This is, he posted this a while ago when we were talking about models and accuracy and, you know, making it, you know, actually like informing decision support systems. I’ll also bring up relevant to this is that I think Bloomberg published a story that’s been going around on, LinkedIn this week about Zillow removing climate risk information from its listings. I think.

Zillow and Redfin, they used to show like flood risk and fire risk. This is data that comes from first street foundation and they took it out. They took it off. And, the, the, the issue being that like, increasingly are encountering decision support information that could be fire risk for your house, you know, or for the house that you’re thinking about deciding to buy. but it’s coming from entities that people aren’t sure whether or not they can trust them.

And I think first street kudos to them sort of demonstrably have produced models that are better than FEMA’s models or like anything that the government’s been able to produce. but, still like validating that sort of information is, difficult. And I think we need, I’m perceiving a need for, I always say new, new data institutions. but like arbiters that can actually like help validate this stuff anyway.

Matt Hanson: Yeah.

Jed Sundwall: Over to you in case you have, I want go back there.

Matt Hanson: Yeah. Yeah. So while I think Tim has a great idea for your name, for you can start another podcast called not great data products and you can like, what do you think you can evaluate? Like really crappy data sets. Like this is the worst, you know, it’s like,

Jed Sundwall: That should be like, we should do special episodes every now and then just like, just talk trash about.

Matt Hanson: Yeah. Yeah. Not great. Yeah. Yeah. This is the worst. so yeah, I mean, so I feel like I just keep on ranting, on this podcast. like, you know, I think we have a real problem with startup companies, especially, don’t know. Maybe this is a worldwide problem. mean, I see it.

Jed Sundwall: Do it! That’s the whole thing.

Matt Hanson: you know, really prevalent in the US here. Startup companies doing really questionable science. And there’s, there’s, because they’re at odds, right? Like the business model that they have is completely at odds with the scientists. I mean, I guess we’ve seen this in, we’ve seen this in very high profile cases outside of the geospatial industry. But like we see it in the geospatial industry as well.

And people making promises for things that really just aren’t practical and over promising and under delivering. and so, yeah, I’m not surprised that like, that Tim has come across a lot of really not great data products. don’t know about the source of those, but like, I’ve seen that. I’ve seen that quite a bit.

Jed Sundwall: Yeah.

Jed Sundwall: Yeah. Well, I mean, look, it’s constant. mean, and I’ll say, I mean, this is why, this is why, I’m at radiant earth, right. And like why I left, it’s not why I left Amazon. Like I was, Amazon is great. Like I had a very good eight years there. but what I realized was like, no, we do need to have institutions that understand how to provide data, but that aren’t owned by investors. Right. So they don’t have.

Matt Hanson: It’s just constant.

Matt Hanson: Yes.

Jed Sundwall: the same sort of like forever growth incentive, which is not to say I should say like, some of my best friends are investors, you know, like, that might not be true, but like, I have plenty of friends who are investors. are funded by investors. don’t think investors have inherently malicious intent. What I would say is that like investor owned or governed companies that are united by, you know, just the need to grow constantly.

are not always going to be the best stewards of data. And I would say in almost all cases, they almost can’t be. The pressure to inshitify is unavoidable. And then also the sort of the competitive need precludes them from being like truly open about their models and how they operate, right? It has to be secret sauce, which I think if you’re saying like,

Matt Hanson: Mm-hmm.

Jed Sundwall: If you’re going out there and saying like, Hey, we have the data that is going to be used to regulate the environment and the real estate market and like risks to human risks to like life on earth. Um, you need to be held to a higher standard than just be like, and it’s good. Trust us. It’s a, it’s our proprietary secret sauce. Um, so.

Matt Hanson: Yeah. Yeah, yeah. And we can bring that back to STAC actually now because a years ago, well, there’s been an effort with STAC coordinating with SEOS. So Matthias Moore has done a bunch of this work. was involved. so SEOS, which is an international committee of space agencies,

Jed Sundwall: yes. Yeah. Bring it home.

Jed Sundwall: Well, yeah, we’ve been, he does that under the umbrella of Radiant Earth,

Matt Hanson: has a thing called ARD, CS ARD, analysis ready data. And so Matias has been doing work on like mapping their requirements for ARD back to stack. And so when I was involved with this a bit some years ago, we were in the early days here, we were identifying like, you know, what fields really need to be included in

in this for them to get the ARD certification or whatever from CIS. And I think the immediate problem that I saw was that you want to really, you really need and require radiometric and geometric accuracy to be published in that metadata. And I don’t think that there’s a ton, I could be wrong, but I don’t think there’s a ton of commercial

satellite companies that are really willing to do that.

Jed Sundwall: Interesting because of the proprietary nature of what they do.

Matt Hanson: because their satellites suck for the most part because like they’re, you know, they’re, they’re, they’re CubeSats. They’re like, they’re, they’re low cost, cheap, you know, things. Now, maybe I get a whole bunch of people mad at me, which somebody told me that means that you’re doing something right recently. But, you know, I don’t want to make a blanket statement about all of it. I love satellite companies, right? Like, you know, they’re, they’re, they’re, got some of my best friends are, are satellite companies.

Jed Sundwall: Okay.

Jed Sundwall: Amazing.

Jed Sundwall: Some of my best friends are satellite. Yeah.

Matt Hanson: But like, but the real, the, but the realistic assessment is that these are lower cost, cheaper satellites. And, and the radiometric accuracy is not going to be up to snuff compared to giant school bus size satellites like Landsat is.

Jed Sundwall: Yeah. Well, interesting. mean, but this does, so this whatever, this is a solvable problem. think you’re just highlighting the need that it needs to be solved is that if we are talking about a future in which more people are deploying sensors, you know, we’re having more low cost sensors going up. A lot of those are to be CubeSats, you know. But again, again, I guess the requirements are going to be bespoke, you know, in the case of every sensor.

Matt Hanson: Mm-hmm.

Jed Sundwall: to determine like, okay, does this meet our needs? You know, I’m a reinsurer. need to have control of my own satellite that I can task, but this is what I need. Interesting.

Okay. I put in links in the, in the chat to a blog post that Matthias wrote about the sort of the cloud native approach to, to doing this stuff. So, or to ARD and, and yeah, and shout out and thanks to, to NASA for funding us to be able to do that work with Matthias because it’s, it’s been great. Okay. Well,

We’ve covered a lot of ground here. mean, you’re, I love talking to you. This has been secretly like, is, this is one of the great things about doing this is that I don’t know the last time I had like an hour and a half or so to just talk to you about stuff. So it’s been a real treat for me. there anything else you want to mention? we didn’t talk about my white paper. Did you read my white paper? Okay.

Matt Hanson: I Yeah, I did yesterday. like, yeah, mean, a lot of great alignment with a lot of things. So yeah, there’s a there’s one thing I perhaps we can talk about this credibility issue because as I told you yesterday, you know, I wrote this blog post and afterwards a colleague of mine

Jed Sundwall: Yeah, yeah.

Matt Hanson: was like, well, there’s something missing from this post in that it seems like there was some, there’s something else that’s required here that you didn’t mention. and I think that thing is it’s this credibility issue. And what I mean by that is it’s not like, like if, if some random person, this happens a lot, right? Like they create a really cool thing and then they go out there and they’re like, Hey, help me with this thing. I want to create a standard. Like they just might not get a whole.

Attraction from that and with stack we had some credibility because of Chris Holmes Chris started it and he had a good reputation and he’d been involved with us Geo He knew a lot of people like he had he brought that credibility to it and we see companies like you mentioned the New York Times right with RSS or you know Google and Metta like they they

Jed Sundwall: Yeah.

Jed Sundwall: Yep, yeah, that’s right.

Matt Hanson: come out with standards, right? Like all the time, because they have this credibility. It’s not, they’re not gorilla standards, right? They don’t actually build them in a community, but they have enough credibility and weight behind them that they can accomplish a similar thing, which is this is a standard, use it, and people start using it.

Jed Sundwall: Yeah. for some reason I feel very compelled to share, I’ll also put in the chat, a link to you just haven’t earned it yet, baby, by the Smiths. Morrissey at his finest, you know, just be like, look, like there’s a, that it is a harsh truth that you have, that you will confront throughout your life. You know, when you’re trying to do anything is like, you do, you do need to earn that credibility. Right. And so like,

So my white paper is called Emergent Standards and basically it’s an exploration of how do standards emerge without an authority coming and saying like, thou shalt do this, right? Linda just commented on LinkedIn, H3 and Uber is another great example where it’s like, Uber clearly knows what they’re doing and H3 was obviously good.

Matt Hanson: Yes. Yep.

Matt Hanson: Mm-hmm.

Jed Sundwall: you know, for, for what it does. And they opened that up and it’s great. And now, you know, we talk about H3 a lot. the, so, so it’s, it’s this interesting, it’s all sweet spots. Like you can’t, we have many examples of institutions that are powerful and have sway in a lot of ways, trying to decree standards that just don’t work because

They are not actually aligned with what practitioners want. then so practitioners can come up with their own thing, but you still have to have a Chris Holmes in the group. have to have somebody who has the convening power or the credibility or something to like actually get, get you to pay attention. it would, which is a drag because it’s like, well, how do you do that? I’m like, I actually don’t know.

Matt Hanson: Yeah, exactly.

Jed Sundwall: Like it’s, it, feels like a historical accident in cases like when, when stuff like that works out. And that’s actually probably true. Most of history is a series of accidents. Yeah. Yeah.

Matt Hanson: I think that’s true. Yep. I think that’s true. Right. There’s been a lot of research into this. you know, if you look at like, you’ve probably familiar with this more than I am. Like if you look at like the path of Bill Gates and like other folks, it’s like, that have become billionaires or founded big companies. Like it’s, it’s, it’s a lot of being in the right place at the right time. It’s a lot of happenstance. It’s a lot of luck. It’s not just because he was brilliant and he just like did stuff and it was like, that

Jed Sundwall: yeah.

Matt Hanson: Like if he lived in another time, if Bill Gates lived in another time or Elon lived in another time, right? Like they wouldn’t be the billionaires they were today. There’s our whole life is pretty much dictated by luck.

Jed Sundwall: yeah. yeah. Actually. So I’ll one final, like bit of self-promotion that I’m allowed to do here is, our last, the last latest episode of texts on texts, my other podcast about literature, we talk about, a short story called anxiety is the dizziness of freedom, by Ted Chang. and which is basically like, it’s awesome. It’s very, it’s totally relevant to what you, what you just said, which is

Matt Hanson: Okay, cool.

Jed Sundwall: It describes a device where you can turn, you flip a switch and it creates a parallel universe that you can communicate with. So you can communicate with what’s called like a para self, like a parallel version of yourself. And it drives people crazy. Like it just causes all sorts of issues for people. Cause like there’s a guy who’s like, he’s like, my parallel self has a girlfriend and like, and I don’t. And I’m like, what’s wrong with me? Like, you know, it just cause yeah, it basically reveals to people that how much of their lives are

Matt Hanson: that.

Jed Sundwall: pretty much out of their control. Anyway, that’s, we’ve gone very far afield, but to bring it back to the point of stack and everything like that and how the stuff is created is like, you do just have to try to do this sort of stuff. I think it’s, you got to try. I think, I think what stack has demonstrated is that it is possible. And I do think that there are,

Matt Hanson: You gotta try.

Jed Sundwall: parts of this playbook that can be documented and repeated. But part of that includes like having, you said it before, building with the community and finding champions. And you have to do that on purpose. So.

Matt Hanson: Yeah, you do. Yeah. Yeah. And yeah, even just being engaged with the community, even if you are building stuff internally, I do feel like the more that you are engaged with the community, the better that thing is going to be. So if you’re working with stack, even if it’s like for internal use, come to the stack community meetings, you know, and let people know what, what you, what you’re up to and like, maybe you’ll get some good feedback. like,

Jed Sundwall: Yes.

Matt Hanson: It’s definitely, you’re gonna be better off, I think. You’re gonna be in a better position the more you work with a larger diverse group of people.

Jed Sundwall: Absolutely. Well, yeah, where should we point people? can send people to stackspec.org where you can learn everything you need to know. As far as getting involved in the community meetings, where do we point people to?

Matt Hanson: There’s a Google group that should be, is it on the webpage?

Jed Sundwall: I’m like looking around, I’m noticing the Stackspec site is directing people to our discourse, which we don’t support anymore. So.

Matt Hanson: Okay, yeah, so there’s some more things that we need to do. So yeah, we’re trying to clean, so the Stack Steering Committee, we actually, I think, have a meeting in the next week. Maybe it’s tomorrow. And we’re trying to clean up some of these things. So, yeah.

Jed Sundwall: All right, well, stay tuned then. Stackspec.org. Matt, you’re easy to connect with on LinkedIn and stuff like that. Maybe, I don’t know. You can join the Cloud Data Geospatial Forum. There’s plenty of people in our Slack, but you have to, we do ask people to pay to join that. It’s not a lot of money. But yeah, there lots of places to get involved, stay tuned and look at stackspec.org and see what you can find there.

Matt Hanson: yeah, yeah.

Jed Sundwall: All right, this has been awesome. Thanks, Matt, for coming on. I predict that we’ll have you on again, because we’ll be doing this forever. And thanks for everything you’ve done for the community.

Matt Hanson: Bye.

Matt Hanson: Yeah, no, thanks for doing this, Jed. Yeah, no, it’s been, this has been fun. I love the chat, so, you know, anytime.

Jed Sundwall: Any time. All right. Well, happy holidays. All right. Bye.

Matt Hanson: All right, you too. Bye bye.

Jed Sundwall: Okay, stay in

→ Episode 2: Protomaps and PMTiles


YouTube video thumbnail
Video also available on LinkedIn

Show notes

Jed talks with Brandon Liu about building maps for the web with Protomaps and PMTiles. We cover why new formats won’t work without a compelling application, how a single-file base map functions as a reusable data product, designing simple specs for long-term usability, and how object storage-based approaches can replace server-based stacks while staying fast and easy to integrate. Many thanks to our listeners from Norway and Egypt who stayed up very late for the live stream!

Key takeaways

  1. Ship a killer app if you want a new format to gain traction — The Protomaps base map is the product that makes the PMTiles format matter.
  2. Single-file, object storage first — PMTiles runs from a bucket or an SD card, with a browser-based viewer for offline use.
  3. Design simple, future‑proof specifications — Keep formats small and reimplementable with minimal dependencies; simplicity preserves longevity and portability.
  4. Prioritize the developer experience — Single-binary installs, easy local preview, and eliminating incidental complexity drive adoption more than raw capability.
  5. Build the right pipeline for the job — Separate visualization-optimized packaging from analysis-ready data; don’t force one format to do everything.

Transcript

(this is an auto-generated transcript and may contain errors)

Jed Sundwall: So I’m going to start it. first of all, happy Halloween, Brandon. Welcome to a special edition of Goth Data Products. If in case anyone’s wondering why we’re both red, if they’re watching, on the listening to the audio only don’t get the benefit of seeing us in this kind of like spooky, spooky color scheme. But welcome.

Brandon Liu: Thanks.

Brandon Liu: Yeah, so thanks for having me on the podcast. I’m excited to talk about, know, ProtoMaps, data source cooperative. So I’m here to answer questions, I guess. Yeah.

Jed Sundwall: Yeah, no, likewise. by the, sorry, I did chicken out. So I’ve changed the lighting. So I’m not right anymore. yeah, no, it’s, it’s great to have you. mean, when, when we started this thing, you were sort really top of mind of somebody who’s been very thoughtful about how to, about what I would call the, what we call the ergonomics of data, like figuring out how to make a lot of data accessible for people. so if you, if you don’t mind, let’s just start there. Like, can you just

Brandon Liu: Right.

Jed Sundwall: How do you describe yourself and what you do?

Brandon Liu: so the way I describe myself is,

I started a project called Protomaps six or seven years ago and the impetus for this was making it easy to make a map. And the direction that came from was very much just like, you think about a web developer that is making a website, like, so for example, they’re making like a site to look up different cafes in their neighborhood.

they might use something like Google Maps, but that is like a proprietary SaaS that they buy. And like, so I really wanted a way to sort of have like a home cooked way to make a map because there’s so many things you can publish on the web. You’re able to publish videos, you’re able to publish pictures or markdown or HTML, but being able to publish a interactive map has never been that way. So really the way I approach this is from

the idea of making it accessible for anyone to publish a map.

Jed Sundwall: Got it. Okay. And so amazing. and you’ve done it. And so you’ve reminded me. So, so one thing that we were going to be doing, I mean, I’m just gonna like say these things out loud, which is kind of funny is like part of the reason for doing this podcast is like, we’re doing so much stuff at radiant earth and we need like more channels to be able to talk about it. and so just last week, we put out a white paper and this will be in the show notes and I’ll put it in the in the chats, but it’s called emergent standards. so what you said is just like very relevant to this, which is that like in the paper, I argue that the web has turned out to be a really like an engine that helps people come up with new data standards. And so if you look at it from through that lens, you have HTML, which is like, let’s share a document and hypertext, you know, like hyperlinked documents with one another.

And then you end up, you’re like, well, what if I don’t want to load up a webpage, but I want a feed of updates. And so RSS emerged out of that. GTFS emerged out of the need for like standardized transit information. And I would say what you’re doing, and I guess specifically with PM tiles is like a way to do this, for vector tiles.

Brandon Liu: Yeah, I have a lot of, I guess, thoughts about the idea of standards in general, both in the web and also for geo. I think a lot of the web, we think about them as standards, like for example, HTML evolved very early. And maybe on the early web, was a lot of more sort of like, it was in the design phase where people would collaborate on creating some spec and that became a standard.

Nowadays, what you see is it’s more like if one of the big companies that makes browsers like Google or Microsoft, they make everyone adopt a standard because it’s in their incentive to do so. If Google can convince everyone to use, what is it, like JPEG 2000 instead of plain JPEG, then they can reduce the amount of bandwidth on the internet by 20%. And that is all that tech.

around things like serving video, serving audio and images is all like very mature to where you don’t really see a lot of emerging standards being adopted organically. They’re more like, there’s this committee at these huge companies that all collaborate on a standard. There is some examples of of sort of more like small scale solutions that became adopted. And that’s really how I see PMTiles fitting in with them.

is like, I don’t want it to be top down. Like I don’t want people to like make their organizations adopt PMTiles. I want people to use it because it solves the problem for them. There is a really cool format for images that I like. It’s called QOI. I think like it stands for literally like the quite okay image format. Like it’s very modest and it’s like, like it’s its name. But I think it is just like one guy came up with

Jed Sundwall: Right.

Jed Sundwall: Okay.

Brandon Liu: a way to do lossless compression of images that is a lot simpler than PNG and is good enough. It’s not more optimized, but it’s way faster to decode on a CPU thread. And that is one good example of a, not a standard from a standards body, but of something that had a simple design that became popular. And it was not adopted because it’s like,

Jed Sundwall: How popular? I’ve never heard of it.

Brandon Liu: I think it’s used, the original motivation was like for games, like if you have game assets and you need to be able to like decompress them and move them around in like, in just like raw RGB formats, then QoI is supported by like some of those engines. But actually like another one you mentioned is GTFS. So GTFS is like more geo adjacent. And that was, it also came out of, think Google’s requirement to like have some

systematic way of storing transit routes. But it wasn’t like some sort of consortium of transit agencies that came together to design like this like CSV format. It was like, it just became a widely adopted solution because it happened to be good enough.

Jed Sundwall: Right.

Yeah, well,

Brandon Liu: And that’s really how I see PMTiles. Yeah.

Jed Sundwall: Yeah. Well, so yeah, I mean, it happened to be good enough. Also Google had this cruise ship that everybody wanted to get on. I think, I don’t know who first said that, described it that way, but like every transit agency in the world was like, we want our data to be in Google maps. And so they had an incentive to do that. And so that’s a concept we explore in the white paper, which is like, you do need this mix of like good enoughness, because that is usually where things land is you have something that’s

good enough for a lot of people to adopt. They’re like, is fine. mean, sorry, what’s the acronym for the image? Like adequate, what is it? Adequate, quite okay. Like I love it. Like that’s usually where things land. Like RSS, the story of RSS is like a bunch of people fighting and a bunch of attempts that like top-down approaches to syndication until people kind of threw their hands up. And, but tellingly then the New York Times adopted it.

Brandon Liu: quite okay. Yeah.

Jed Sundwall: and started publishing RSS feeds and everyone’s like, okay, this is what we’re doing now. So it’s fascinating to see, do you have any sense for the traction of PM tiles as being like this? Like who’s using it?

Brandon Liu: So I have a couple of proxy ways. So I don’t actually know how many people are using it because by nature, I can’t track. I can’t add a tracking pixel each time someone looks at a map. The one thing I can track is the number of NPM downloads. So NPM is the package manager for JavaScript. And that is, I think, the most popular client for reading NPM tiles. And it’s something that I’ve

Jed Sundwall: Yeah.

Brandon Liu: it’s something that I maintain and that crossed like 100,000 downloads per month or it’s either per month or per week. I can’t remember like this year. So you can see like a growth curve of people using this library. Now I don’t actually know if that means anything because it could just be like an automated CI script, like on GitHub actions that is downloading it like a thousand times. But it has some correlation with usage. So the only way that I can kind of

see if people are using PMTiles or if it’s being adopted is through this like proxy metric like NPM downloads. Or people show me like a site that is built using it. So actually like probably the biggest one is like I think the New York Times had a visualization on their homepage that was about like a space debris falling to earth. And that used a map data set that was served from PMTiles.

Jed Sundwall: Okay.

Brandon Liu: So probably like a dataset that’s being served on the New York Times like front page in PMTiles format is like probably like the most high traffic use of it.

Jed Sundwall: Okay.

Jed Sundwall: Here we are again with the New York Times. really, it’s kind of interesting. mean, you think about the legacy of the New York Times as being like it, the story about them sort of crowning RSS, the standard for syndication, like that’s true. Like they did that. And like, they do have the imprimatur to do that kind of thing, which is, that’s awesome. That’s great. Like that’s, that is a sign that you’ve, you’ve made it. Shout out to Tim Wallace, who probably had something to do with.

Brandon Liu: Okay.

Jed Sundwall: with the New York Times using PMTiles. That’s awesome. Okay. Well, so one thing I can say though is like on source, you know, we host a lot of PMTiles files and you can correct me on all of this. Like there are some kind of like base map objects I think that are in there or something like that. But if I search GitHub, it’s one of my favorite things to do is to search GitHub for…

references to the source data proxy, which is data.source.coop. as of today, earlier today, it’s like 612 results pop up when I search for it. But a lot of them are two PM files. Do you know anything about, do you have any insight into that?

Brandon Liu: right. so the project I run is sort of an umbrella project called Protomaps and PM Piles is just one part. And that was by design because I never thought it would be good enough to just design a format. Because it’s like, if you design a format, then you also have to have like some killer app that makes people actually care. Because just having like a spec with some implementations is like, people are like, that’s cool. But like,

Jed Sundwall: Or aware of that? Yeah.

Jed Sundwall: Yes, yeah. Okay.

Jed Sundwall: All

Brandon Liu: I can’t immediately take advantage of it. So the way I approached it was to have like a killer app, which is a base map or like what people think of when they think of a map, which is like you look at it and there’s like city names and there’s like water and like roads and stuff. That’s based on OSM. So the actual data product that is like open source and free by default in the PM tiles format is this base map.

that’s from OSM. And I think a lot of the links to source are to that because going back to what I started with, it’s like if people just want some solution for showing a map on their site, know, like as an open source replacement to Google that they can run themselves, that they can copy and they can move around, they can download like, so as if it was a video or an image. But I imagine a lot of the links are to that just because it’s designed to be something that’s like immediately useful.

Jed Sundwall: Yeah, that’s what I was guessing.

Brandon Liu: Now I think with source, I think the the cores policy is like quite open. So if there’s other data sets, like a scientific data set that is in PM tiles format, people could link to that. And hopefully people do that more or they download from source and mirror to their own buckets and use that.

Jed Sundwall: Yeah. Yeah. Yeah. So I mean, this is something that we have to, we’re going to have to do our own analysis on this at some point. which is like, what is the cost of us hosting those, those objects? Cause yeah, our core’s policy is wide open so people can do that. and we can do the math on this, but I mean, you know, shout out to AWS. Thank you to the AWS Open Data Program that, still exists.

after yesterday. anyway, was a tough day for a lot of people at Amazon yesterday. There were a lot of layoffs, but the Open Data program is alive and kicking. And so they subsidize all of our storage and bandwidth for source. But we do want to get serious about this at some point and have an understanding of like, how much should it really cost to do something like this at what scale? We have…

Brandon Liu: Yeah.

Jed Sundwall: All the analytics we need, just haven’t sifted through the data yet to figure out like which of those objects are being hit the most and how much and what’s the throughput that’s going out. Cause I know you’ve done analysis on the costs of doing these things. I imagine you have some data on how much it costs to deploy PM tiles, but we also have a lot of this data, but we just haven’t shared it yet. So.

Brandon Liu: Right. So going back to that for a moment though, like, so I wonder if you think about like, like that idea of like being able to search for GitHub for all the links to source for people that are like hotlinking to it. Like in some sense, like I think it’s, it’s not directly correlated to success. Just, just a number of people that are consuming source. If people are making a copy of the data, if people are copying the data they get from source to their own bucket and then using that.

Jed Sundwall: Yeah. Yeah.

Jed Sundwall: Of course not, yeah.

Brandon Liu: That is still like using the platform as intended. Like there isn’t really like by design, I don’t know if source is designed to be like an intermediary platform. Like for example, like Airbnb. So for Airbnb is like you go to the site and you look up like bookings, like listings, but they will stop you from trying to go off the platform to like make an arrangement with like your host because that’s like, that’s, that’s exactly against their business model. Right. That’s like.

Jed Sundwall: Yes.

Jed Sundwall: Yeah.

Brandon Liu: So it’s for Airbnb, the entire point is like, they’re an intermediary between you, like your desire for a room and the host. Now, so I don’t think source is by design as like a data platform to be an intermediary for all data. There is a lot of like open data platforms in the past that have worked that way, where they make it very difficult for you to be able to consume the data outside of the platform. But it feels like with the sort of cloud native focus, part of the idea is that you’re able to

Jed Sundwall: Right, right.

Brandon Liu: you know, just like package up data and take it to go or access it just in chunks instead of having to be locked in to just using source. So if there was some way to maybe promote that as like a first-class way to consume source instead of just linking to assets, then maybe that would help alleviate some of these ideas around like cost sharing for bandwidth.

Jed Sundwall: Yeah.

Jed Sundwall: Yeah, well, no, mean, let me address this and then I want to acknowledge we have a viewer, Sig Till, I’m not exactly sure who they are, but they’re Sig Till on YouTube, who is joining us from Norway. So we were like, let’s do this at 4 p.m. Pacific. Sorry, everybody in Europe, but we’re doing it Asia Pacific, or at least, I mean, it’s what, it’s 7 a.m. where you are. So we’re kind of in a…

weird time zone right now, but we had somebody from Norway tuning in to ask what’s in the future for PM tiles and which changes would you like to see in the format itself or new tools that use the format? But anyway, Sigtil, just don’t go to sleep just yet. We’ll answer your question. The vision of source is not so much to be an intermediary. Sources by design, it doesn’t really do much other than provide reliable access to objects.

So we call it, it’s a data publishing utility. It’s not an analytic tool. I’m happy to have, I want people to build stuff on top of source. So yes, I do want people to link to it. However, this is math that we, this is kind of my point in saying we have to do this analysis on our usage is to say, well, how much is that really gonna cost us if we do that? And are there ways for us to…

get a handle on bandwidth and usage so that we don’t, we’re not abused, you know, or rather, abuse isn’t the right term, but just so that we can afford to do that in a way that’s reasonable. And so, and to say like, look, if you don’t want to host your own object somewhere, which tons of people don’t, I mean, sort of a core tenant of the product design is that like, we just know that a lot of people don’t want to host their own stuff. Like they don’t want to their own servers. They don’t want to think about infrastructure at all. If we can,

let them just link to reliable assets that are available. That’s great. But we have to figure out a way to do that in a way that doesn’t, you know, could scale to the usage of something like Google Maps without bankrupting us, you know? Then that means we have to figure out, for example, with like the open course policies, do we have to have some sort of way to say like, no, no, you have to be put onto an allow list?

Jed Sundwall: to be able to link to this or something like that. We’re gonna have to figure that out. So you’re right that I don’t want to be an intermediary. We’re not really trying to log people into source, but we do wanna provide a service that allows people to access data without having to download and re-serve their own copies if they don’t wanna do that.

Brandon Liu: Right. I mean, on the other hand, feel like, so part of the messaging is that just having object storage is a commodity. And in my experience, talking to developers that use PMTiles or that use other cloud data formats, a lot of people find using S3 very accessible, and it’s not a huge lift to ask them to be like, hey, go put this thing in your bucket. And it’s even among non-

Like I would say you could just be a front end developer. could be someone that spends all their time doing TypeScript programming and know nothing about like servers and you can figure out like object storage. So I think part of the solving point I’m trying to make is like exactly. Yeah. Like that audience I think is extremely large of people that of people that like it’s too much of a lift to host something like a server.

Jed Sundwall: That’s my story. Yeah.

Brandon Liu: But just putting a thing in a bucket is actually like a very good experience. It’s very simple, it has a nice abstraction. And if you can sort of encourage the world to be more object storage-y, that’s the way I think about it. And that’s a big part of why I think PMTiles as a format has succeeded is because that audience is so large.

Jed Sundwall: Yeah, totally. mean, so yes, agree. I’ll just tell a bit of history. I’ve told this story, tell the story a million times. I’ll probably tell it a lot as we keep doing this podcast, but like the story of the origin of the cloud optimized GeoTIFF and all this was when I found myself at AWS building this open data program and I figured out this one weird trick that I could just get the company to give out free S3.

but I had no engineers. had no, like I was embedded within a sales organization. So like, due to like HR practices, like the idea of hiring engineers to build software or tools or anything was out of the question. And so I’m like, what can we get away with if we can only use S3? And I also being kind of, guess I would say a front end guy, although I’ve never been like ever officially hired as an engineer, loved S3.

It’s like very intuitive product, super powerful, very capable. I wasn’t afraid of it. And so I’ll say this, like you, very talented, smart person, knows how to use S3, isn’t afraid of it, and neither are your friends. There’s tons of people out there that are afraid of S3. Like Source, and actually I got to shout this out. We’ve been working with Development Seed on Source. Anthony Lukash, shout out to Anthony at Development Seed has been.

just cranking out new features on source. Today, we pushed out like you can upload stuff into S3 through the browser through source. for source users now, which you still have to be invited to be a source user, you don’t even have to use the CLI. You don’t have to, you don’t have to look at the AWS console. Like I’m just here to tell you there’s a whole universe of people out there that they’re like.

No, I am scared of S3. I’m scared of AWS. I don’t want to look at that console. And I saw somewhere some tweet that was like, it was in reference to Vercell or something like that, but it was just sort of like, it’s amazing how big of a business you can build just by building an abstraction layer on top of the AWS console. And so that’s really what we’re trying to do. And in fact, I do hope there will be people in the future. mean, we already have a…

Jed Sundwall: a bunch of other organizations that are hosting their own PMTiles on source, they would rather put it on source than host their own S3 server. So, or rather like manage your own AWS account. So, I’ll leave it at that. Let me make sure, I’m hoping Sig is still awake in Norway. Do you want to take this question? What’s in the future for PMTiles?

Brandon Liu: What’s in the future? I would say the current version of the spec version three is done. There aren’t any plans for a version four right now. And I think I kind of got lucky in that sense that there was nothing like someone at a conference last month in Japan, they asked me is like, do you have any regrets about like the format design right now? And I’m like, I thought about it. I’m like, not really. It’s not perfect. Like the design overall has very specific trade-offs, you know?

Jed Sundwall: Okay.

Jed Sundwall: Okay.

Brandon Liu: Like it’s, almost stupidly simple in some sense. And like, didn’t want it to like get too carried away. didn’t want to like embed CRS information and that kind of thing. I would say the lowest hanging fruit for PM tiles is better compression methods, but that’s blocked on browser implementations. it, like, so browsers only support GZIP for decompression stream APIs. If that supported something like Z standard.

That would be great, but that is blocked on Apple, Microsoft, Google implementing Z standard support. What changes would I like to see in the formats itself? The format itself is, right now it’s good enough for static data. I would really like to see another format emerge that is for dynamic data that is still like S3 optimized.

that handles rapidly changing data. Because right now, if you edited some geodata and created a PMTiles, you’d have to replace the whole file on object storage. And that is a huge trade-off. Thankfully, a lot of the data out there is you can generate this building data set once. And maybe once a month, you run a new job and it generates a new one. Each time you are replacing it.

Jed Sundwall: Yeah.

Jed Sundwall: Yeah. Yes.

Brandon Liu: What I really want to see is a cloud native storage engine for real-time data. That would be a totally different design than PMTiles, but I think it’s still possible to do a cloud native thing on S3, for example, where maybe you have data in chunks, and then those chunks are addressed by a hash. And then you have a header that is just a reference to hashes. And then as you upload new data or data changes, you create new chunks and reference those.

and then garbage collect them. So I would like to see some other new formats separate from PM tiles that addresses real-time data. In terms of new tools for the format, sort of along this line, one experimental tool I have for PM tiles is a way to do deltas. So you have to replace a PM tiles on S3 each time. But I was thinking about a way to rsync data.

Like if you have like a 200 gigabyte PM tiles on the cloud, and then you have 200 on your desktop and they’re mostly the same, but one part is changed. You can use an algorithm like R sync basically to just fetch the parts that have changed. So that’s like one way from like the cloud to your computer, not the other way around. But I would like to see some use cases for that because I sort of built it as an idea.

But there’s not really a strong compelling use case right now. So that’s, those are a lot of my ideas for the PM tiles ecosystem right now.

Jed Sundwall: Okay, I love that.

you’re unearthing some feelings about source and like, you so we’re trying to, want source to be kind of this like one-to-one proxy between like for S3, but the idea being that we can create durable URLs that are undergirded by.

as many object stores as we want. So like if you have an object, you should be able to mirror it in lots of different regions and across clouds. And if you have your own S3 compatible object store, like we should be able to point to it and stuff like that. But a really interesting thing happened. If you go to, you’ll have to look around on this, but like the data.source.coop repo on GitHub, which is the repo for our data proxy, this guy Sylvain Lassage, who we’ve been working with on viewers,

You’ve encountered him on GitHub. He’s like, it’s weird. Hugging face can stream CSVs, but S3 can’t. And he looked into it and it had something to do with some header stuff that I don’t remember the details of. But it was like an easy add to the proxy that was basically just like it would pass some more information in the header when you’re calling the CSV and you can stream the CSV. And so like.

We have, we’ve crossed that line. It’s like, it’s sort of like, we’re going to do something. S three API doesn’t do. And I can see us going down a path where we are.

Jed Sundwall: more than just like a very simple abstraction on top of S3, but we’re extending what object stores can do. So we should keep talking about that.

Brandon Liu: Right. And also like, so going back to the idea of like a top down versus a bottom up standard. So S3 has become like a de facto standard, like a totally undocumented standard where every other vendor like sort of only implements the features they need to be S3 compatible. And if something is like wrong or like broken, they’re like, well, that’s how S3 works, you know? So it’s sort of become this, this odd thing where this quirky design that Amazon came up with.

Jed Sundwall: Yeah. That’s right. Yep.

Jed Sundwall: Right. Right.

Brandon Liu: is now like what everyone has to do de facto because all the tooling is built on is built with those assumptions that like this API, this XML API exists. They’re trying to do new things though with like there’s that like S3 express one zone that works differently. There is I think a new way to do like partial uploads. Like you can define an upload as being copied from a different object and that’s like accelerated.

Jed Sundwall: Yeah.

Jed Sundwall: Yeah.

Brandon Liu: But yeah, like it would be cool if some other company came up with like an actual, like maybe a more, like a more featureful spec for S3. But again, probably why it succeeded to the point it has is because it’s so simple. It’s like dumb, you know, there’s no really fancy, there’s no fancy semantics around like content hashes and stuff. Like if you look at how Google storage works, you know, it does seem like they had some, you know, some…

Jed Sundwall: Yeah?

Jed Sundwall: Well, right.

Brandon Liu: whatever like level seven engineers sit in a basement for months and like come up with some cooler design that is like more correct or that is more scalable. So there is platforms like Google storage that seem to have more sophistication than S3, but they don’t have the adoption of S3 in terms of the API, not the specific Amazon platform, but like the API, the interface. And I think that is like a fundamental thing, which is there’s always gonna be this trade-off between like,

Jed Sundwall: Yeah. Yeah.

Brandon Liu: the simpler and dumber you make it, the more likely it is to thrive, you know, like thrive organically. In terms of people being able to write their own implementation, people writing tools. That I think is also like the trade off between something like PM tiles, which is, you know, like I keep saying, it’s, it’s, simple and dumb versus something that is more full fledged, like a server application that serves WMS tiles, for example.

Jed Sundwall: Right, yeah, I mean, so we just have to be very careful with how we go about this. I imagine you’re familiar with the concept of pace layering or pace layers. You heard of this? Yeah, so I’m putting another, I’m just gonna be putting stuff in the chat. this is, it’s an idea I think Stuart Brand came up with, which is basically the notion that like you,

Brandon Liu: I don’t think so.

Jed Sundwall: that society, like the world, like society is our experience as humans moving through the world. It’s based on all these things that are moving at different rates. like nature undergirds everything. And on top of that, we have all kinds of different life forms and then humans have developed culture and governance and law, language itself. But these are all layers that like they evolve at faster and faster rates.

The funny thing is like sort of the top layer of the pace layer diagram is always like fashion, which is like all over the place. like fashion is this kind of like unpredictable crazy thing that humans do, but that’s based on these other more sort of like foundational things like markets and law and language and blah, blah. And so that’s how I, so, mean, I was at Amazon for eight years. like, and I totally bought into

the philosophy of AWS, which is to provide primitives, to provide primitive services that are reliable and are effectively, extremely durable. We had an AWS crash quite recently. things go wrong, but it’s pretty remarkably stable service in terms of like how complex it is and how much stuff it supports. But the way they do that is by being very primitive.

I would say there’s, to your point, there’s obviously room to extend that. And I think the right way to go about it or to think about it is to extend on top of the primitive. But to go slowly, you wanna add layers very carefully on top.

Jed Sundwall: All right, let’s see here. I make sure that we’re… I’m figuring out this chat stream thing. I can see it here in Riverside. Sorry, everybody out there, but we’re still figuring out how to do this thing. So I’m curious to get your… I mean, when did you realize you could just do a really huge file? Just like one gigantic file.

Brandon Liu: so I started ProtoMaps the project before I created PMTiles. and the original plan was to have a server, like a server process, that like serve tiles out of a database. So the original design was like not like, it was not like cloud native or cloud optimized at all. It did not use range requests. It was like a, it was still one file.

Jed Sundwall: Yeah.

Brandon Liu: that you like stored on a server and you had to like run this program to be able to like serve it over HTTP. And then I like, I eventually figured out that I could sort of cut out that entire part just by making it something that you could put onto like on S3 as a static file. So that actually came in probably like one or two years into the project.

For me, it’s so in a lot of cases, like that idea of being locked into using the server process to like serve the tiles, that is sort of like a feature. Like for most businesses, like, like if you have to run it on server, that creates like lock-in, you know, and you can monetize that. You can add like, you can add a paywall. You can say, Hey, like, so if you want to like be able to access this thing, it goes to the server. Just like get this API key, you know, once you go, once you go over like 10,000.

Jed Sundwall: Exactly, yes.

Brandon Liu: request, then you can pay like a subscription, like pay as you go. So that’s like a feature is to be able to like have it be a, a file like on a server versus just a single static object. but then like, once my, my thinking around like, okay, well like, you know, what is the long-term way this project succeeds? I’m like, you know, isn’t it more interesting to have it just be this like single object?

that you can copy around, like as if it was a video. So right, the original like motivation for the project was coming from like being able to create custom maps and host them yourself. Just the nature of how that was hosted evolved from being a traditional like sort of sassy server thing to being this like object storage focused thing later on.

Jed Sundwall: Okay, fascinating. Yeah, I mean, the…

this notion that if you control the server, if you have to be this intermediary, you get to control the data flows and also the users. I was thinking like studying Netflix is a really interesting thing to do if you think about like a data business. Netflix is a data business. They sell subscriptions to data. And the way they’re able to do that is by controlling the entire interface, like the entire chain. And so you have to go through them and pay their subscription and…

experience, know, have the Netflix experience, which is good. You know, the fact is like they provide, there’s a huge audience for that kind of data, which is videos that people like to watch. And they’ve just nailed the experience and people are happy to pay for that. know, whereas like, there are certainly people out there that are like, nope, like you have to have your own DVDs or I’m going to run my own local NAS with a bunch of my own video files because I want to have control. But most people are like, whatever, I don’t want to have to think about this. And so.

So all I’m saying is like, I’m underscoring the point that like, there is a business in providing that kind of service to people, but the market for maps is way too small to justify that kind of thing. That’s why I think so many geospatial like SaaS companies have had such a hard time because they might be able to provide a great, great experience to get some vectors and rasters and stuff delivered over their interface, but like,

the market for it’s just way too small to justify it. anyway, I’m a fan of your approach for obvious reasons. And I’m sorry, let me just keep going because Rachel Googler on LinkedIn asked, this is relevant to this. She asked, she said, were the AWS outers last week in Azure issues today? Which I didn’t know about that. We’ve seen how reliant we are as a society on centralized cloud infrastructure. How can cloud native formats be used in temporary local area or

Jed Sundwall: or peer-to-peer networks when that centralized connectivity is gone, such as during natural disasters. I think you kind of answered her question right away, but do want to address that kind of idea directly? Like how you think about this?

Brandon Liu: So I think of the Protomaps project as something that works on a server or works on S3, but also as something that works on an SD card. It’s like, if you can put a map or you can put a dataset from source, like a scientific dataset onto an SD card and carry it into the forest, then that is like…

That’s good enough, right? That’s how most technology should work. That’s how videos work. That’s how Word documents work. So I think once you’ve built the primitives, it addresses a lot of these questions about like portability and being able to be resilient against like certain failures of networks, for example. There is some interesting things around peer-to-peer. I know one of the contributors to PMTiles was like,

playing around with IPFS, which is like this distributed storage system, like where everything is like addressed by hash. think it’s cool. don’t know a lot about it, but I’m happy to hear that just designing like a simple single file format can be directly applied or like it just works with these things like IPFS. And…

Jed Sundwall: Yeah. Yeah.

Brandon Liu: I haven’t seen a lot of adoption for that specific peer-to-peer system outside of some more niche use cases. But in theory, so you could build a really resilient network of storage for any kind of data as long as what you’re trying to serve is just these simple files.

Jed Sundwall: Yeah, yeah, well, I mean, and again, I mean, I think the sort of the Netflix example is a good one to explain this, like to highlight also the sort of Rachel’s point of like these single points of failure that can occur where like if you are relying on one system to be able to deliver content like in a very specific way, if that system is brittle, it goes down for any reason, like you’re hosed, but this is the…

This is core to the file-based approach to data architectures, or what I would say specifically the object-based approach, because I like object storage, is that resilience in the face of a system going down to your point, like you can put on an SD card and take it into a forest, that’s perfect, that’s a great way to think about it. There’s kind of no way of getting around the power and effectiveness of sneaker net. However, this opens up the…

the door to a question that I’ve had about PMTiles is that you’ve created PMTiles as this format. If you give, so if I show up with a PMTiles file on an SD card and give it to a random person, they will not be able to open it. They’re gonna double click on it and be like, what is this? How do you get away with that? I mean, yeah.

Brandon Liu: Yeah.

Brandon Liu: Yeah. I think it’s tough because like, it sort of depends on the observer, right? Or the person opening it, are they opening it on Android? Are they opening it on Windows? Can I go talk to Apple and ask them to put a PMTiles viewer into Mac OS or something? And I think like my solution is this web viewer. There’s a website called PMTiles.io that I maintain where you can just like drag and drop.

Jed Sundwall: Right.

Brandon Liu: a local PMTiles file or a URL of a PMTiles on the cloud. So the sort of intention was that viewer emerged at the very beginning. There has to be essentially a file preview for these things that works locally too. You shouldn’t have to spin out the web server to be able to look at something. So the thing about data is people want to look at it. People don’t believe that it exists until they can see it.

It’s just like this inherent bias. So we know the machine can read it. People don’t trust it until they can look at it. And that is a lot of why people care about PMTiles overall is because they might have geo data in some format, but if they want to visualize it, have to turn it into some more visualizable format. And that’s really what PMTiles is, is making visualization easy. So the answer for the web viewer is as long as they have a copy of that.

web viewer, is open source on the USB stick, then they should be able to open that offline in a browser and just like open up that PMTiles file. That viewer is built using all like pretty standard web stuff. It uses map Libre and some like browser APIs.

Jed Sundwall: Right. But is that all built? Can that viewer be… that all be… This is a very naive question. Could you just have like an HTML file on that stick that contains the entire viewer?

Brandon Liu: And a JavaScript bundle. Yeah. The, there is a static build of it, cause it’s hosted on GitHub pages actually. And GitHub pages is just static files. So you could just like clone down a copy of that HTML JavaScript CSS bundle and have it offline and that should work. there is this like interesting question though of like, okay, like, there’s certain like formats like for archiving that are like, I think it’s like the library of Congress. They have like standards about like.

Jed Sundwall: Yeah, okay.

Jed Sundwall: Okay. right. Yeah.

Jed Sundwall: Yeah. Yeah.

Brandon Liu: they recommend JPEG as a format because it’s like based on the likelihood of like in like 50 years, there’s like some like library science people that are like, like we have these like historical like scans of like restaurant menus, but how do we open them? Because there’s like this, there’s this image format that like was popular, you know, back in the, in the two thousands and now nobody can read it. So there’s like this open question of like, you know, is,

Jed Sundwall: Right.

Jed Sundwall: Right.

Jed Sundwall: Yeah.

Brandon Liu: is PMTiles like a resilient format in, but like by that standard of measure. And I think that the way the format is designed, it could fit on one page. You know, it’s like, like I know people that have written like a implementation in a different language, like Rust or Swift or something, and they can do it in like a day because the format is intentionally like, like as simple as possible, like going back to

Jed Sundwall: Yeah.

Right.

Brandon Liu: that QOI format, just like, it needs to fit on like one PDF page. It can’t be like a white paper, like 200 page book to be able to write a reader. So like my hope is that even if all of, know, if GitHub, you know, like it’s blessed into the sun and we lose all the code, but you have to like write a reader for PM tiles, like from scratch. And all you have is the spec. I don’t think it’s that hard. It should be doable.

So even if you didn’t have like that web viewer or a thing on a USB stick, you could figure it out.

Jed Sundwall: Yeah. Amazing. This is, I mean, this is great. We’ll, we’ll be announcing this right away. but the, the next episode of great data products is with, we were pretty sure it’s going to be with the Harvard, library innovation lab. It’s the Harvard law school library innovation lab. So where I found like my kind of librarians, you know, that are thinking a lot about, you know, understand the benefits of object storage and these, you know, primitive commoditized layers of storage, but they have a lot of thoughts about this and.

we’re talking about many different types of content, but I think, I hope I want to make sure they, they hear this because your thoughtfulness on this, think is like really, really great. I mean, thinking, you know, the tagline of this podcast is the ergonomic ergonomics and craft of data. And you’re thinking so far ahead, like, what are the ergonomics of like finding a PM tiles file in the like rubble left after the nuclear like winter and people be like, actually I can figure this out.

What, yeah, great experience you’re thinking of for the future archaeologists. Have, yeah.

Brandon Liu: Right. So just as a comparison point, like it’s probably fine to like sort of bash on, as Ray stuff here, like I saw, I don’t think I’m like, or it’s not a bashing on it, but even like file, like a file geo database, which is like an F F GDB format. There are city governments that publish F GDBs and they expect you to open them. And like most developers that are not into as re ecosystem cannot open these files.

Jed Sundwall: Yeah.

Brandon Liu: Like I think like it might’ve been like in New York city, like they distribute their like road network as an FGDB. And you know, that format was maybe designed like 15 years ago. And even then most people I talked to are like, what do do with this file? I have no idea what to do with it. So that’s like an extreme example of like, well, you know, it’s not even a question of like 50 years of like…

of being able to open the file like in 50 years, it’s a question of like even five years later after you publish it, can anyone deal with this thing? And it’s like, well, not really. I think it’s like kind of proprietary or maybe there is some spec, but even things like shapefile, like shapefile like was proprietary from the very beginning, right? And then people sort of like kind of made some, made some like reverse engineered like readers for shapefile.

Jed Sundwall: Right?

Brandon Liu: And even then there’s like undocumented extensions for like doing indexing and stuff on top of shapefile. But it’s like all those things are, I think they sort of like fail this question of like that library tests. Like are people going to adopt this if they are thinking about things, like if they’re trying to preserve things like for the future.

Jed Sundwall: Yeah, absolutely. I mean, this is, you’re thinking the right way, you know, and what’s interesting is that like, Jackson says, geo package. That’s, yeah, there’s an answer there. Yeah. mean, what’s remarkable about,

Brandon Liu: Geo package, yeah.

Jed Sundwall: about the, I mean, just thinking about this, like just how short the history of the internet and computing really is, you know? And so it’s fun to think about what things will be like a hundred years from now or whatever. But like we went through a blip, I would say, where people were like, oh yeah, the way to control the market is by controlling the standards. know, Microsoft did that very effectively and developed incredible network effects through the dock and know, XLS formats.

that have since been effectively opened, but who cares? By this time, the damage is already done. Everybody uses Word and Excel, which I should also say, I’m not mad about. I think they’re great, obviously powerful tools that everyone uses. It’s technology that’s well distributed, so I’m not mad about that. But in the future, we have to think more about exactly what you’re saying, which is just sort of like, how durable is this going to be, really?

And that means being very thoughtful about how you design the spec. And it’s usually gonna be something simple. The only other thing I’ll say here is that like, I don’t wanna seem like I’m picking on PM tiles, cause like if I double click on a PM tiles file, nothing will happen. The same is true for Parquet, right? And so Parquet is like all the rage. So much data on hugging face right now is in Parquet. We love having tons of Parquet data on source.

And I was showing a guy earlier today who’s not really familiar with it, but I opened up on source and these are my favorite demos. My PMTiles demo is the best demo source, because we’ve got a great viewer built in and you can just look at it and it’s easy for people. Thank you for that viewer, the viewer that you created. And then Sylvain also built this Parquet viewer and it’s like, great, like now, you know, I mean, as of today, somebody can drag and drop a Parquet file into source.

and they can look at it in the browser right away. And I showed this guy, I’m like, yeah, here’s a parquet file, it’s 800,000 rows. And it’s just like streaming right through really easily. And we’re already at a point where there’s so much data out there and so many files are being adopted that like, no one’s even bothering developing a desktop viewer for them. It’s all being done in the browser. Like it’s all the expectations that’s gonna be done over the internet, which is amazing.

Jed Sundwall: we got some comments coming through. Yousef from Egypt. Hello, I don’t know who knows what time it is over there. He says new versions of GDALC can open up FGDB now.

Cheetal for the win.

Brandon Liu: I think I saw that. Yeah. I think like my standard workflow now is like I downloaded like the FGDB of like, it’s like New York city road center lines. And then I do like an OGR to OGR and just get it into like a geo JSON or something. but yeah, I believe there is a solution now. I remember, I think there was one time, like a decade ago where I like downloaded like the ArcGIS Pro trial and like activated the trial just to be able to like open.

the FGDB and then like save it out as something else. But I think that like the status quo is better now. Yeah, for sure.

Jed Sundwall: Yeah. Yeah.

Yeah, mean, GDAL, it just…

Shout out to Evan. A few more comments on YouTube. Jackson, hello Jackson. He says he’s in the midst of writing an implementation of GeoPackage in Julia. Good luck. Let us know. If you want to write about that on the CNG blog, we have a process for submitting stuff to the blogs. That’d be cool. It’s 2 52 in the morning where Yusuf is. Brandon, you are very popular. People are like, this is incredible.

Sun never sets on the brand and Lou Proto Maps empire. And then we’ve got Sigtil again from Norway, staying awake. I love this, this late night energy we’re getting. Asking, how do you see the new kid in town Geo Park versus PM titles? They have some of the same properties and some differences also. As you said, there’s a lot of new clay. Yeah, so yeah, I have Zarak, Hog, Flat Geo Buffs.

Brandon Liu: Cheers.

Jed Sundwall: You’ve explained this to me before, sort of the nuance between like what PMTiles does as opposed to what GeoParK does. I mean, I have my own guesses about this because it’s, GeoParK is like more about like data than PMTiles, which is more about viewing. Is that how you would describe it? Or what’s your response there?

Brandon Liu: That’s how I see it. Yeah. Like, so I make the distinction between like an, and like a format for, that is for analysis versus a format that’s like for visualization. And I think that’s like, maybe not intuitive because in some cases, those are the same. Like for a cog, viewing it and analyzing it are sort of the same because analyzing it means like, what is the value at this pixel? And viewing it is like, show me the raster, you know, colored in some way.

Jed Sundwall: Yeah.

Jed Sundwall: Yeah.

Brandon Liu: For PMTiles, a lot of the use cases right now for PMTiles are vector-based. And for vector, you sort of need to split out the analysis and visualization into separate things. Because if you wanted an overview for a vector dataset, you can’t really show everything. It would be too noisy. So PMTiles is inherently generalized. Like it has like an overview pyramid.

Jed Sundwall: Yeah.

Brandon Liu: So you can load it at any scale and it looks correct. But what you actually see at that level is like not, is not everything. You have to do some filtering down of the data. Sort of like for, for cogs, you have to build overviews that are like smaller and smaller down sampled resolution, like images of the full thing. So GeoPARK is, is, does not have a lot of use case overlap with PMTiles because GeoPARK is like,

and analytical format that is all, it’s just like all the raw data and then only one version of each, and only one version of each data point. While PM tiles will have copies of a single data point because it has to build those overviews. Now there are like approaches to using GeoParkade and visualizing it directly. Like for example, so there’s a project called Lawn Board that lets you like just show,

Jed Sundwall: Right, right, right.

Brandon Liu: GeoParkay on a map, whether or not that’s practical to use on the web really depends because if you want to be able to download an entire GeoParkay data set to visualize of a city, that might be 200 megabytes, which is more than people usually expect for a single web page. I mean, it’s possible that in 10 years, bandwidth will be so fast and cheap.

that downloading 200 megs for a single webpage might not matter. And maybe we like at that point, we don’t actually need like a visualization format. We can just be downloading raw data like everywhere. But I expect like some sort of strategy around being able to visualize data with overviews is always going to be necessary just because like some datasets are just really big. Like there’s building datasets on source that are like, like maybe half a terabyte, like they’re like open buildings datasets.

Jed Sundwall: Yeah.

Jed Sundwall: Yeah, the VITA datasets, those are my favorite demos. They’re like 300 gigs or 230 gigs or something like that. like, yeah, it’s like, it’s only going to be streamed.

Brandon Liu: Yeah.

Jed Sundwall: My assumption is that storage will keep getting cheaper. There’s still plenty of room to progress in terms of the cost of storage itself, but bandwidth, networking has actual physical limits in terms of the speed of light that I think are really compressing space like that. The movement of bytes over space or across space is really hard.

One, actually Qsheng Wu, awesome to have Qsheng on here, says that DuckDB supports serving vector tiles through Parquet, so they’re on LinkedIn. So, cool. It’s great. And then we have another, I wanna talk to you about the Hilbert curve. We’re getting at about an hour, so we can maybe start wrapping it up. But then Alex Kovac asks, and I’m gonna test this out, I’m still figuring out how to do this. You can see it, okay, so.

Brandon Liu: Nice.

Brandon Liu: I see it.

Jed Sundwall: How did, I think the people on LinkedIn can’t see this. So this is tooling on PM tiles. And also for the purposes of the people listening after the fact. Alex says tooling around PM tiles such as the viewer CLI, typical new base maps package, et cetera, is super convenient. How did that evolve? And do you think there’s anything big missing? Yeah.

Brandon Liu: Yes. I think part of the part I put the most thought into was the overall developer experience of using pm tiles. And from the beginning, had to be like a single binary you could just download. I did not want you to have to homebrew install or npm install or Python package install a package, just because that’s going to fail for a lot of people.

Jed Sundwall: Yeah.

Jed Sundwall: Yeah.

Brandon Liu: If you’ve ever been to a workshop where people like use Python, like a scientific workshop where people are like, we’re providing the material as like a Jupiter notebook. And then someone’s like, I’m on windows. And then you’re like, just use Conda. And then you’re like, trying to like fiddle with this, this conda setup. And I’m like, I don’t, I just don’t like, like, I feel like it, like it pushes people away. Like I understand that like that tooling is mature, but for me, it’s like, I think the best developer experience for any sort of data tooling is like.

Jed Sundwall: You’re right.

Jed Sundwall: Yeah.

Brandon Liu: just download a single binary. Like those are the tools I see having the most adoption and least problems in terms of like the installation. So the installation has to be super simple, like a single download. The viewer we talked about is like the web viewer is for like the viewer for PMTiles files to just browse them. I would say if there’s something big missing, I think tpknew is great.

and PMTile support for that is built in, thanks to felt. But I would say it’s still too hard to install. Like a lot of people that want to build PMTiles, they get stuck on like, do you install the vector tile generator? I would say that is the biggest missing piece, which is to have a single binary download vector tile engine.

Jed Sundwall: Okay.

Brandon Liu: Like a lot of the limitation for that is because the libraries you need to do geometry, like geometry processing, are generally only in a couple of languages, like C++, Java. And right now the CLI is like a GIL program and there’s no good libraries for that to go. Even Rust doesn’t have that great of support. You probably need to bring in a Geos via C++ bindings. So the biggest missing part is still like some…

Jed Sundwall: Yeah.

Brandon Liu: easy to install and large amount of data generator for vector tiles. It’s something I do want to work on, but right now I think the tpknew solution is good enough. But it’s the major pain point for using PMTiles.

Jed Sundwall: Yeah. I mean, talk about ergonomics of data. The way you think about this is so great. Everyone learned from Brandon. You’re so thoughtful. this is also just kind of like, see this is you’re helping level up the species just by thinking through things this way. Because yeah, it’s so goofy. I mean, I’ve been in all these hackathons in these rooms where people are like, yeah, like.

you end up spending half the time debugging people’s Python installations. it’s just like, no, there’s got to be a better way. Yeah.

Brandon Liu: Right. There’s also this idea of different kinds of complexity. There’s like inherent complexity versus incidental complexity. And I think a lot of solving these pain points is around solving incidental complexity, which is just complexity that happens to be there as an artifact that is not related to the actual problem we’re solving. Like maybe you’re trying to solve some route optimization problem. And that is it’s…

like is inherently a interesting computer science problem. But then the, the incidental part is like, I need to like install these packages with Conda and Conda is the like, doesn’t like this wrong version of my machine or something. And it’s just like, all that stuff is just like the part that is like, we can really like, we have to eliminate that in order to actually get to working on the hard problems.

Jed Sundwall: Right.

Jed Sundwall: Yeah, exactly. There’s what’s the line? It’s sort of you make the hard stuff easy and the like impossible stuff possible or something. There’s some axiom around like, know, guiding software development along these lines, which is like, we should be continually progressing in that direction. But you’re asking all these great questions or like framing it in the right way, which is just sort of like you imagine somebody who’s coming to a hackathon.

how quickly can you get them up and running? If you’re gonna take an SD card into the forest, what can you actually do with that, realistically? And I often think in terms of, this is what I was saying before about Excel and Word being very successful, is that they are sufficiently distributed technologies. The whole idea that the future’s already here, it’s just not evenly distributed. There are some that are evenly distributed, like spreadsheet software.

Like everyone can open a CSV. Like that’s awesome. CSVs are great, like because of that. But you know, as we’re getting better at producing more complex forms of data, we need to think about the ergonomics in that way. Like what are the experiences of people being introduced to this? So, Yusuf says that to pick a new in Windows is a nightmare by the way. So FYI.

Brandon Liu: heard that as well. Yeah. Yeah, I’m aware.

Jed Sundwall: So I remember years ago I asked you if you’d ever seen the movie Tar.

Brandon Liu: which I still haven’t, but I need to now that you’ve mentioned it twice.

Jed Sundwall: Okay, well, I’m just like, it’s a, TAR is a weird, TAR fans come out and tell me if you’ve watched the movie TAR. It’s TAR with an accent on the A, it’s a Todd Field movie, in which David Hilbert is a character of sorts. Like he just shows up in the background and I think there are references in the movie to the Hilbert curve.

Tell me about the Hilbert curve. Let’s close on this. Why the Hilbert curve and how did you get into space filling curves? I love this stuff.

Brandon Liu: I kind of ripped it off of S2. So S2 is Google’s geospatial indexing library, and they use the Hilbert curve there. It has some nice properties that make it work well for geodata. And the motivation behind this is even in Cloud-Optimized GeoTIFF,

Jed Sundwall: Okay. Yeah.

Jed Sundwall: Yeah.

Jed Sundwall: Okay.

Brandon Liu: People argue about like, like, so we’re making like a cloud, like a cloud optimized format, but like how big should the blocks be? You know, you’re like fetching blocks. If you have small blocks, those are good for certain use cases. If you have big blocks, those are good for like, for more like bulk downloading use cases, it’s more efficient. And there’s some trade-off between small blocks and large blocks. But the Hilbert curve is like a way to like, it’s like a lazy way to get around that argument.

which is because like it’s both small blocks and big blocks in the same, like in the same format. You can actually have any size block as long as the power of two. And the reason this is good for PM tiles is because one of the operations on PM tiles is for extracting one part of the world from a larger file. And the imagined use case for this is, so I host my OpenStreetMap data set on the cloud.

Jed Sundwall: Yeah.

Brandon Liu: But maybe you only care about Seattle. You don’t want to have a copy of 100 gigs of the whole world. You only want Seattle. Or maybe you only want Capitol Hill. So the block size in the archive should be small if you only care about a neighborhood. But if somebody else wants all of Canada instead, then they want to be able to have a format that has big blocks so they can download Canada in one chunk.

So the Hilbert curve is useful because it encompasses both of those use cases without having to make a trade off. Because if you did small blocks, it would be good for Capitol Hill, it would be bad for Canada. If you did big blocks, it’d be good for Canada, it’d be bad for Capitol Hill. So because the Hilbert curve is sort of scale-free, it has the same self-similar structure at every power of two.

you sort of get the best of both worlds in one thing. And that’s really the motivation for why the Hilbert curve was useful for this design. I would say it’s not fundamentally essential. You could build a pretty good format just using like other space filling curves or like a Z-order curve. There is some drawbacks in terms of it’s more computationally expensive to decode the Hilbert curve versus other ones.

Jed Sundwall: Yeah.

Jed Sundwall: Okay.

Brandon Liu: For example, there is these Bing, Quan key tile indexes that are much faster to compute than the Hilbert curve. For most use cases though, the cost of decoding and encoding the Hilbert curve is trivial compared to the network. If it spends two milliseconds doing a bunch of tile coordinates on Hilbert, then you’re spending 50 milliseconds fetching something over the network.

Jed Sundwall: interesting.

Jed Sundwall: Okay.

Brandon Liu: So like overall, like holistically, the price you pay for using the Hilbert curve is not that much relative to other things going on in like in some actual use case. But that’s like kind of the whole story as to why we use this like weird thing that is apparently in a movie as well.

Jed Sundwall: Yeah, I mean, just the movie. I turned the light red again, just because it’s kind of a spooky movie. Let me, there’s BV on YouTube asked a question if H3 grids are similar to the useful, but one thing I want to clarify about the Hilbert curve and like to make sure I understand it, which I’m pretty sure I don’t, which is that like the idea is that you can map two dimensions along one dimension.

Brandon Liu: Yeah.

Jed Sundwall: Right? Like with, you you just have like one string that can be extended into two dimensions, like effectively anywhere at any resolution you want. If I’m doing, if I’m loading up the Canada tile, am I just loading up one band? Like, how does it, how does it work? Like, or is it making multiple requests to do that? That’s, can you explain that even? Like, it sounds like the kind of thing you would need a whiteboard to describe, but.

Brandon Liu: Yeah, you’re opening up multiple like, so if the entire world is on one length of string, then Canada is multiple segments of that range of string. Now, where you can adjust is how finely traced the borders of Canada are because

Jed Sundwall: Yeah.

Jed Sundwall: Yeah.

Jed Sundwall: Yeah.

Brandon Liu: If you’re working in a networked environment, you can do some optimizations. can say, I’m going to grab a little bit more data than I need, but have less ranges. I can represent Canada using fewer segments of string, even though I get a little bit of America on the side.

Jed Sundwall: Yeah.

Jed Sundwall: Yeah.

Jed Sundwall: Right.

Brandon Liu: Pretty much that, like there isn’t really one Canada tile, but you can sort of trace out a contiguous segment of the file that is all next to each other, that is all inside of Canada. And then maybe grab a little bit on the sides for like different outline areas. But the interior of Canada, as long as it’s like an area, you know, like most countries in the world or most regions are not like Chile where it’s just like one long thing.

most of them are like kind of rectangular-ish, you know, they have like an interior and then like a border. So this sort of space filling curve is well suited to how people usually think about areas as having like an internal volume and then being able to slice that into just parts of this space filling curve without having to, you know, like use an excess of

Jed Sundwall: Yeah.

Jed Sundwall: Okay.

Got it. And then one follow up question on that from the chat is that, is there a benefit here that also these requests are close to each other? Meaning like, you want to look at the full Canada tile and then like the Vancouver tile, should they be near each other? My intuition though is that that shouldn’t matter with object storage and range requests, because it’s not like you’re.

He’s saying like, it’s similar to like how you defragment an old spinning hard drive, but like, that’s not how object storage works. I mean, we’re not assuming that we’re using spinning disk. We might be, but do you have any insight there? Yeah.

Brandon Liu: Right, so it matters a lot on HDDs because it’s like on those old spinning hard drives, it’s like you have to move the needle more if they’re not by each other. But I think most storage now is solid state and there’s not a huge difference in the seek time for like a far away chunk versus a near chunk. But yeah, there is also benefits to certain operations. Just having parts that are close in space also be close in the file.

Jed Sundwall: You have a head. That’s right. That’s right.

Brandon Liu: that is taken advantage of in some parts of the tool.

Jed Sundwall: Okay. And then let’s, do you have opinions about H3? mean, so BV is asking, are H3 grids similarly useful? I see it as probably not, but I don’t know how H3 content is. H3 is more of like an indexing concept. know.

Brandon Liu: H3 is really useful for visualization. Yeah, I think it’s like, so H3 is like, you’re usually storing like a value in each cell. And I think it’s like, it’s really great for making like really good looking visualizations of data with hexagons. There is some trade-offs like in H3, one hexagon does not perfectly nest.

Jed Sundwall: Right.

Jed Sundwall: Yeah.

Brandon Liu: it’s child hexagons while in tiles there is a perfect nesting. But for certain use cases like showing like aggregate statistics, it doesn’t matter. So I would say H3 grids are the perfect use or are the perfect match for certain use cases around visualization that are separate from doing tiling.

Jed Sundwall: Right, right.

Jed Sundwall: Right.

Yeah, exactly. Yeah, that’s sort of my understanding. And it is especially good for like visualization, but then also like statistics. Like, so if you’re doing like analysis on, I mean, you just think about the origins of it with Uber wanting to measure demand and activity in very, like very certain areas of different grains. It’s like perfect for that. So, okay. Well, look, we’ve been going for an hour and 15 minutes. This is incredible. We’ve got…

people stand up to all sorts of crazy, guys go to bed. Again, there’s a podcast. Like this audio will go out so you can listen to it whenever. But I really, we have been honored. People are honoring us with their time. I hope this has been interesting for them. Brandon, I love talking to you. I love, I obviously love what you’re doing. We’re very proud to have you as a Radiant Earth Fellow and have had you as a fellow for a long time.

man, are you serious? It’s this, Sigtil in Norway won’t let up. He’s got to go to bed, but he’s asking more questions. Are there some geometries that are not supported more difficult? For instance, polygons with, boy, with holes and holes made of curves, et cetera. What was the most difficult geometry to work with across tiles? This is too hard of a question. Are you seeing this comment? Go for it.

Brandon Liu: No, it’s like I’m able to address this. Yeah, it’s I mean, this is a good like deep question, but it goes back to what I was saying is that there is certain geometries that are hard to deal with. And a lot of it is you have to have a geometry library that is very robust against certain like numerical precision errors. And the only libraries right now that get it totally right are basically like Geos, which is part of

Jed Sundwall: All right, do it and then we’ll wrap it up. Okay.

Jed Sundwall: Yeah.

Jed Sundwall: Yeah.

Brandon Liu: part of PostGIS and JTS, which is a Java library that is related to Geos. And then a couple other ones, like there’s one that Mapbox made. But yeah, like that difficult geometry is the limitation in being able to write like an easy to install vector tile generator. So I would, I’m happy to follow up over email or something if you wanna like know more about like geometry processing, cause it’s like a really deep.

Jed Sundwall: Yeah.

Brandon Liu: subject that sort of is a stealth hard problem. People don’t realize how hard that problem is until they find some weird geometry that’s broken. But yeah, that is a good question. And again, I’m happy to talk about it more.

Jed Sundwall: Okay, and then, so to contact you, I’m gonna just put in the chat, protomaps.com, go to protomaps.com, there’s info down at the bottom with how to reach you. So, you’re easy to reach. Obviously, everyone listening to this knows how thoughtful you are. So, anyway, I mean, thanks so much for what you’ve given to our community.

Can’t thank you enough. Anything else you want to talk about? we missed?

Brandon Liu: I just wanted to say thanks for having me on the podcast. I am also on the, CNG Slack, the source cooperative Slack, which one do you want people to use? if people are CNG members, then they can join.

Jed Sundwall: That’s right, yeah.

Well, yeah, so CNG members, you got to be a member. For both, you kind of have to be a member. So membership to CNG is pretty cheap. We say it’s a symbolic fee. these memberships don’t really add up to pay many bills, but we ask people to pay to join CNG just to make sure that we know that people are there on purpose. They really want to be there. So join CNG if you’re not, and Brandon’s in the Slack there. Sores is still invite-only.

But source, so yeah, the best point of entry right now is the CNG, the Cloud Native Geo Slack. You can go to cloudnativegeo.org slash join and learn how to learn about it there. I’ll put that in the chat as well. But yeah, thank you. Yeah, it would be great to see people interacting with Brandon on any of our slacks, but he’s easy to find otherwise.

All right. And then it’s what is it? 817 in the morning there now.

Brandon Liu: It is, yeah. It’s red and early.

Jed Sundwall: You got a whole day ahead of you. All right, well, happy Thursday. Thanks again for doing this. I bet we’ll do it again.

Brandon Liu: Awesome, yeah, I’m looking forward to the next episodes.