nesse video vou mostrar como mapear nomes de Rua com uma camada de fundo com dados de nomes de Rua disponibilizada pelo IBGE.
Link do arquivo usado no video com os dados de nomes de Rua de algumas cidades brasileiras para o mapeamento de nomes de Ruas no Openstreetmap.
OBS: use esse arquivo como camada de fundo.
https://projeto.softwarelivre.tec.br/s/3zxd54iSjr7YQZ4
Link do Video; https://www.youtube.com/watch?v=2hyHqiViMMw&t=118s
Baixar os aquivos da camada de face do Senso-2022;IBGE http://gpsutil.com/CNEFE-2022/
Fonte: IBGE; Senso-2022
UMBRAOSM - União dos Mapeadores Brasileiro do Openstreetmap
This is cross-linked from my blog, posted on 11 July 2023
I recently joined the Humanitarian OpenStreetMap Team in modering their Community Working Group and Tech Working Group’s discussions on AI-Assisted Mapping.
This, in part, was a reminder that I (still!) haven’t published an OSM diary that summarises my MA research, and that publication is still process. But it was also a reminder that I have yet to really summarise or bring together what I have shared so far.
Here’s a short summary:
В iD появилось поле “ID изображения в Panoramax”, которое отвечает за тег panoramax=*. Все, кто по моему совету заливал фото/панорамы на Panoramax теперь могут пометить POI (и не только) тегами с ID их фото на этом сервисе.
Todo mapeador aficionado debería saber que todas las imágenes obtenidas de forma remota (por ejemplo, imágenes satelitales y fotografías aéreas) y otras fuentes, excepto las trazas GPS, suelen estar mal alineadas con respecto a la realidad. Como resultado, el cartógrafo debería poder alinear las imágenes en su editor preferido utilizando trazas GPS. El proceso lleva entre 5 y 10 minutos y, debido a la precisión del GPS, los resultados varían. No es tan malo si eres el único cartógrafo en cientos de kilómetros a lo largo de cientos de años, pero para la cartografía cooperativa, incluso una variación de un metro importa.
Un análisis del problema reveló dos posibles soluciones, ambas igualmente efectivas. La base de datos de desplazamientos y el complemento de desplazamientos implementan ambas; no se excluyen, sino que se complementan entre sí, lo que permite al mapeador verificar los desplazamientos no solo con trazas GPS, sino utilizando un método de alineación alternativo.
Cada objeto en la base de datos OSM tiene coordenadas (en grados en la proyección WGS84/EPSG:4326), una fecha de creación, un autor y una descripción, que ayudan a determinar la aplicabilidad y la cobertura de un desplazamiento.
Es una buena práctica utilizar esta base de datos en nuestro mapeo de cada área. Si te interesa aprender a usar este mecanismo puedes leer: https://wiki.openstreetmap.org/wiki/ES:Imagery_Offset_Database
Some days ago, I searched online for a bus route that was supposed to be newly introduced to go from Kilkenny to New Ross. I didn’t find it, but I found another one which pleased me even more which goes from Kilkenny to Fiddown (ref=891
). The reason it pleases me is that the other route is already partly covered by another bus company and I don’t really need it, and the 891 covers a route that goes past several historical sites and at least two hiking routes. Since I don’t drive, I will certainly avail of it myself. I don’t mind organizing myself lifts, and I enjoy the company of my “drivers”, but sometimes it’s good to be more independent. For context, the bus route started on January 20th 2025.
So I decided to track it, because I don’t really trust Transport for Ireland’s route maps, and I can’t be sure that they didn’t use proprietary map material to provide the routes online, even though their background map is OSM. But I have seen routes on their website which they seemed to have taken out of thin air which had nothing to do with the actual route the bus takes.
From the bus schedule, I had a fair idea of where the bus was going to go, and I had travelled most of the roads already and captured street-level-imagery, but I thought it would be no harm to do it again. Of course, that is a bit tricky on the bus, and I couldn’t ask the driver to leave the GoPro Max on the roof of the bus. I had brought the magnetic foot, but turns out, there is very little plane metal space on the bus. So the setup was a bit wobbly, but it worked. It only swayed in the bendier bends.
It’s partly held in place by my bag:
I was sure it was gonna crash at some point, especially with some of the bumpier rural roads, but it didn’t.
This is the sequence where I had least coverage before.
The battery lasted just about long enough.
As usual, I uploaded to Mapillary first and then to Panoramax. I might have forgotten to change the viewpoint angle on the Mapillary imagery… At least on Panoramax, I could go in and change it later.
A project I’m working on at the moment is adding maxspeed
to the rural roads. On February 7th 2025, a directive from the Department for Transport lowered the speed limit on the “local rural roads” (the ones where the reference number starts with L and those without any reference number, I presume - I’m still waiting for a reply from the Department for Transport) from 80 km/h to 60km/h. If you’ve ever driven on one of those, you’ll wonder how anyone in their right state of mind would even attempt to go at 80 km/h, but that was the law. We’re (OSM) still missing a lot of the L numbers nationwide, because of course, the government is not able to provide them as open data, that would be too easy. So, that was one of the things I looked out for - missing L numbers. It would have saved me a lot of frustration, if they were displayed closer to the junction in OSMAnd, but I had to long-click on everyone of them where it wasn’t displayed to see whether it was already mapped or not. I also added notes for missing speed limit signs, even where they wrongly still displayed 80 km/h. I mapped those on the desktop PC without the wrong speed limit, just to know where the traffic signs are for later reference and to be able to split the highway
at the correct location. I’m also not quite sure how to map speed limit signs which have different speed limits on the front and back, bc I can’t use direction
then.
I used OSMAnd for tracking the routes, adding notes and adding the odd POI (post boxes, defibs, wayside shrine etc).
When I got on the bus, I asked the bus driver to point out where the new bus stops are, because I did not expect bus stop signs for most of them. #Ireland #experience He said I was right, because it was “a new bus route”. Sure, that makes sense. We wouldn’t want people to know where the bus stops are to promote the route and establish sustainable routines, would we?
Anyways, some locations are therefore also only estimates, because he didn’t slow down, and just told me the name of the place in passing. Someone will have to re-survey in about a year, when there is a slightly better chance of bus stop signs, but I have my doubts that they will ever come. As a Canadian friend put it, “oral tradition is still very strong in Ireland”.
The “highlight” of the trip was an unscheduled excursion into the bus depot, because the heating on the bus wasn’t working, and the gaggle of bus drivers (four at some point on the bus) thought it would help to add water somewhere. It didn’t. So, four hours at about 8°C. The sacrifices we make for the general good. But the bus depot is mapped now which one would not locate from the street.
Here’s the route, if anyone is interested.
construction
I also spotted a couple newly built or under construction houses for which I also left notes. Most of them are not visible on the outdated aerial imagery yet, so I drew estimates and left fixme
s.
I think I spotted one house number and three house names which I added to the map. Because, unlike in other countries like Lithuania or the Netherlands, the government also doesn’t publish open address data. Do I sound frustrated? That’s because I am.
I think in total, I left 7 pages of notes: Notes by b-unicycling.
All in all, I’m glad to report that I was not the only passenger on the bus; there was a bus driver in training and for most of the journey, there were one to four other people with me. The word has spread already, it seems.
Learn about our plan for Phase 2 of the AI for Earth Observation and Field Boundaries Initiative and how to get involved.
The post Join TGE in Phase 2 of AI for Earth Observation and Field Boundaries appeared first on Taylor Geospatial Engine.
50+ Female Product Manager/Senior GIS Analyst at North Road, Program Chair FOSS4G Oceania 2022-2024, QGIS AU Committee Q. Emma, Where in the world are you and what do you do? In sunny South East Queensland where the winter temperatures are in the 20° (Celsius degrees), but I am more of a -1° Celsius gal. I […]
The post When I was 15 years old, I proclaimed I would be a cartographer appeared first on GeoHipster.
To follow along with this tutorial, please setup and configure a Oracle Cloud instance for mapillary tools. Please refer to this guide for instructions on how to accomplish this.
ssh -i ~/.ssh/oracle_mapillary_keys/[your private key (not .pub)] opc@[IP Address copied from Oracle]
scp -i ~/.ssh/oracle_mapillary_keys/[your private key (not .pub)] [/path/to/local/360/file] opc@[IP Address copied from Oracle]:~/mapillary
mapillary_tools process_and_upload ~/mapillary/*.360
rm ~/mapillary/*.360 && exit
mkdir ~/.ssh && mkdir ~/.ssh/oracle_mapillary_keys
mv ~/Downloads/[your private & public keys] ~/.ssh/oracle_mapillary_keys/ && chmod 600 ~/.ssh/oracle_mapillary_keys/[your private key (not .pub)]
yes
.
ssh -i ~/.ssh/oracle_mapillary_keys/[your private key (not .pub)] opc@[IP Address copied from Oracle]
pip3 install –user –no-cache-dir –upgrade pip && pip3 install –user –no-cache-dir mapillary_tools
echo ‘export PATH=$PATH:~/.local/bin’ >> ~/.bashrc source ~/.bashrc
mapillaary_tools --version
.mapillary_tools authenticate
mkdir -p ~/mapillary
Seven months ago, we issued A Call to Action for the Data Community to break down geospatial data silos and make GIS a core part of analytics. Today, we’re thrilled to announce two major developments that bring this vision closer to reality:
The Parquet specification has officially adopted geospatial guidance, enabling native storage of GEOMETRY and GEOGRAPHY types
Iceberg 3 now includes GEOMETRY and GEOGRAPHY as part of its official specification
Now both Parquet and Iceberg support columns of type GEOMETRY or GEOGRAPHY just like INT32, INT64, FLOAT32, etc. columns! Yay! This is a landmark achievement for geospatial data! 🎉
First, a heartfelt thank you to everyone who contributed to this effort—engineers, early adopters, and advocates who pushed for geospatial data to be treated as a first-class citizen. This milestone wasn’t achieved overnight; it took years of collaboration across organizations and ecosystems. From the early days of GeoParquet 1.0 to today’s native Parquet support, this progress demonstrates the power of open-source community action.
The GeoParquet initiative has always aimed to make geospatial data “boringly interoperable.” With Parquet and Iceberg now supporting geometry types natively, GeoParquet is entering its next phase.
GeoParquet 1.0/1.1: Parquet files with additional metadata to “label” geometries/geographies
GeoParquet 2.0: Regular Parquet files utilizing native GEOMETRY and GEOGRAPHY data types
While native support represents the future of geospatial data storage, adoption will take time. We recommend:
Continuing with GeoParquet 1.1 for production systems until tools fully support Parquet’s native geospatial types. A few pioneer implementations have started.
Planning for eventual migration to GeoParquet 2.0
Following our upcoming migration guides and best practices and some discussions on exact differences between versions. In an ideal world we would make Geoparquet 2.0 files also be compatible with 1.1 and 1.0, stay tuned for that.
While achieving native geospatial type support is a significant milestone, our work isn’t finished. Our immediate focus areas include:
Developing best practices for GeoParquet 2.0 implementation
Creating clear transition guidelines from previous versions
Establishing standards for CRS handling and performance optimization
Continuing outreach and advocacy for widespread adoption
This is just the beginning of modernizing geospatial data storage. We’re already looking ahead to other types of geospatial data such as raster, point cloud, spatial indexes…
The journey to truly integrated geospatial analytics continues, but with GeoParquet 2.0, we’ve taken a major step forward. Stay tuned for more updates and guidance as we work toward making geospatial data a natural part of every analytics stack. And if you’d like to be more involved we’ll be working in the GeoParquet GitHub repo. We also run bi-weekly meetings on advancing geospatial in Parquet, Iceberg and Arrow, just join the geoparquet-community group and you’ll be added to the calendar. And we’re also starting up a meeting for implementors of geospatial in Iceberg to share best practices and work through any issues.
The GRASS GIS 8.4.1RC1 release provides more than 70 improvements and fixes with respect to the release 8.4.0. Please support us in testing this release candidate.
The post GRASS GIS 8.4.1RC1 released appeared first on Markus Neteler Consulting.
Researchers used four years of measurements from a deep space satellite to calculate the average monthly heights of Saharan dust clouds.
The post Deep Space Mapping of Saharan Dust Height appeared first on Geography Realm.
So I recently started working on adding the features of a local high school into a relations group, but after looking at the docs, I’m not sure I’ve properly understood when or how relations should be used. Could somebody help clarify this?
Geomob Berlin took place at 18:00 on Wednesday the 12th of February, 2025 at Fora - Pressehaus Podium, Karl-Liebknecht-Straße 29A, 10178 Berlin (Google Maps, OpenStreetMap). Nearest stops are Alexanderplatz and Rosa-Luxemburg-Platz.
Our format for the evening will be as it always has been:
doors open at 18:00, set up and general mingling
at 18:30 we begin the talks with a very brief introduction
Each speaker will have slides and speak for 10-15 minutes. After each talk there will be time for 2-3 questions.
We vote - using Feature Upvote - for the best speaker. The winner will receive a SplashMap and unending glory (see the full list of all past winners).
We head to a nearby pub for discussion and #geobeers paid for by the sponsors.
Christian Wygoda, Sensor Tasking API Spec (at SatVu)
Javier Jimenez Shaw, Measuring distances in maps.
Henrik Schönemann, GIS meets Gedenkstätte
Riccardo Klinger, The Great German Data Treasure Hunt
Want to speak at a future event? Volunteers needed.
Geomob Berlin is organized by Peter Rose and Ed Freyfogle
Geomob would not be possible without speakers and sponsors. Over the years we have had so many fantastic talks, spanning the range from inspirational to informative to weird and wacky. See the list of all the past speakers. Please get in touch if you would like to speak at a future Geomob.
Fremantle
· wdlocator · Wikimedia · OSM ·
I've upgraded toolforge:wdlocator to PHP 8.2 and Symfony 7, and in doing so I think have fixed a long-standing (but unknown to me!) bug with how it was selecting the user interface language. It's supposed to change based on the Accept-Language header, but there was a bug with that in our ToolforgeBundle. I think we fixed that bug ages ago, but I forgot to update wdlocator. So now I have, and it can be read in Indonesian at e.g. https://wdlocator.toolforge.org/?uselang=id#map=17/-8.72520/115.17650
(I mention Indonesian, and the map above is centred on Denpasar, because that's where I'm going tomorrow. For the Wikisource Conference.)
I know I should also add a UI for actually selecting a language, but that'll have to wait.
Fremantle
· OSM · history · OpenHistoricalMap ·
Amazing work has been done by CharliePlett with Mapping the British Empire on OpenHistoricalMap:
That screenshot is how the Empire looked between 1900-10-09 when the Cook Islands were added, and 1900-10-19 when Niue was added (not to bowdlerise the act of empire building or anything with a word like 'added').
Well, I’ve just discovered that anybody can edit like it’s wikipedia. So instead of doing homework, I’ve spent several hours fixing little things in my hometown. This is fun.
npm install leaflet react-leaflet @types/leaflet import ‘leaflet/dist/leaflet.css’; import { MapContainer, TileLayer, GeoJSON } from ‘react-leaflet’;
<MapContainer center={[39.8283, -98.5795]} zoom={4} style={{ height: ‘400px’, width: ‘100%’ }} >
<GeoJSON data={stateGeoJSON} style={stateStyle} onEachFeature={onEachFeature} /> </MapContainer>
I used this query in overpass turbo:
[out:xml][timeout:25];
// fetch area “Kildare” to search in
{{geocodeArea:County Kildare}}->.searchArea;
// gather results
(way["highway"="tertiary"]["maxspeed"="80"](area.searchArea);
way["highway"="unclassified"]["maxspeed"="80"](area.searchArea);)->.roads;
// print results
(.roads;>;); out meta;
and loaded it directly into Josm, then replaced 80 with 60, and added maxspeed:type=IE:rural
as suggested on the talk-ie
list, which I hope will make the difference between “rural” local roads, and those at the edge of urban areas, obvious.
I couldn’t find any exceptions listed by Kildare County Council, so I’m reasonably confident I shouldn’t have scooped up any inappropriately, but do shout if I’m wrong.
Hydroclimate whiplash - the rapid swing between drought and heavy precipitation - plays a role in the increasing intensity of California wildfires.
The post Hydroclimate Whiplash: the Impact on California Wildfires appeared first on Geography Realm.
Estilo de Pintura Coloured_Suburb; https://josm.openstreetmap.de/wiki/Styles/Coloured_Suburb
Estilo de Pintura Coloured_Streets: https://josm.openstreetmap.de/wiki/Styles/Coloured_Streets
Contato por E-mail: [email protected]
UMBRAOSM - UNião dos Mapeadores Brasileiro do Openstreetmap www.umbraosm.com.br
Veja nesse video como instalar um plugin no Editor Josm e começar a mepear Edificações de forma muito mais rapida e facil.
link do Video; https://www.youtube.com/watch?v=q6vVhAC4BKo
UMBRAOSM - União dos Mapeadores Brasileiros do Openstreetmap www.umbraosm.com.br
For PostGIS Day this year I researched a little into one of my favourite topics, the history of relational databases. I feel like in general we do not pay a lot of attention to history in software development. To quote Yoda, “All his life has he looked away… to the future, to the horizon. Never his mind on where he was. Hmm? What he was doing.”
Anyways, this year I took on the topic of the early history of spatial databases in particular. There was a lot going on in the ’90s in the field, and in many ways PostGIS was a late entrant, even though it gobbled up a lot of the user base eventually.
Como mapear um limite de Bairro no Openstreetmap com a camada personalizada de fundo do senso2022.
O IBGE disponibilizou os dados de limites de bairros que pode ser usado para o mapeamento no openstreetmap, porem precisamos primeiro personalizar a camada de limite de Bairros do IBGE para que possa mapear no openstreetmap.
nesse video mostro como criar uma camada personalizada com os dados do senso-2022. https://www.youtube.com/watch?v=0eKytB2F28A
Para baixar a camada de fundo personaliza que foi usada nesse video, você pode baixar nesse link. https://projeto.softwarelivre.tec.br/s/jdoPGWyTysPGyFa
Link do Video: https://www.youtube.com/watch?v=sTe-1N2QvLY&t=16s
União dos Mapeadores Brasileiros do Openstreetmap
para entrar em contato conosco, nosso e-mail: [email protected] , nosso site: www.umbraosm.com.br
My End Goal: Have all the ferries in the world listed in the best possible way on Openstreetmaps!
Have you benefitted from Cloud-Optimized GeoTIFF’s? SpatioTemporal Asset Catalogs? Zarr, COPC or GeoParquet? Not just the formats, but the whole ecosystem of tools and data around it? Well I’d like to present you with an incredibly easy opportunity to ‘pay it forward’ and help build and expand the movement. And all you have to do is attend a conference! One that should be a totally awesome experience, the first in-person CNG Conference, from April 30th to May 2nd.
I have big dreams for this conference, as my hope is that it can expand in the next few years to become a truly vendor-neutral gathering for anyone working in and around geospatial data. To be one of those conferences that has the critical mass where you know ‘everyone’ you want to talk to will be there. In North America there’s really only two options for this: Esri UC and GeoINT. Both are incredible events, but Esri UC controls their guest list (as they should…) and GeoINT is very focused on defense and intel (as it should…). I think a third would go beyond ‘GIS’ and beyond the strong core military-oriented use cases that do provide the core economic engine for the industry today. It’d be a big tent that is welcoming of anyone working with geospatial data on any use case, at any scale, with any tool.
The first CNG Conference will be the biggest Cloud Native Geo event ever, but it will not be a huge affair, and it’s really important that we ‘sell it out’ and demonstrate the momentum to get to larger venues and larger sponsors for the next few years. And I believe it’s going to be an awesome gathering of all types of people who like to nerd out on solving real world challenges with geospatial data and insights. You’ll certainly learn some new stuff, and make connections that you’ll likely reap rewards from for years. The cloud-native communities are a collection of tribes working on related problems, and this is going to be the first true in person gathering that combines the different tribes, to borrow from how FOSS4G is often described. FOSS4G is one of my favorite conferences, and my hope is we can get that same energy, but expand from open source software to be inclusive of any commercial software, and also to be more centered around data (and standards).
So please, buy your ticket soon and join us! And don’t worry if you’re not already deep in Cloud-Native Geospatial, or even barely know what it is, as a major goal is to help everyone learn. One of the three primary tracks is ‘On-ramp to Cloud-Native Geospatial Data’.
There’s a great line-up of speakers, with more coming. And if you want to give back ‘more’ than just attending, please apply to present and share successes you’ve had with Cloud Native Geo. If you work for or lead in an organization that has benefitted from CNG then please try get them to sponsor. There are a number of benefits your organization gets from sponsoring, and not just at the event — it also comes with commercial membership to Cloud Native Geo Forum. This gets 8 individual memberships to CNG, plus a blog post on CNG website, and a speaking slot at an event. And I’m pretty sure if they come in as sponsors soon then that spot will be at the CNG Conference. It will be a great opportunity for your products/work to reach an influential audience, in a vendor-neutral environment. If you need any help convincing your organization to sponsor feel free to get in touch — I’m happy to help.
I’m excited about how this event will build momentum for the CNG Forum. It’s a consolidation of this movement into a bit more of a formal structure, with a great mission:
The vision is to make geospatial data easier to access to use, and to grow the resources being invested to make that happen. It’s all under the Radiant Earth non-profit (501(c)3), which means that all revenue from membership and the conference (after covering costs) goes back into strengthening the community, bringing people together more, and fulfilling the vision.
So please buy your ticket, join the CNG forum, present and/or sponsor. It will make a real difference, and I promise it will be a valuable and fun time in Utah.
Aggiunte:
le nuove corse della linea 19 prolungate da piazza della Libertà a Barcola
linea 19/ tra largo Barriera e Barcola
rimossa la 20 (sperimentazione precedente)
per maggiori informazioni visita il sito tpltrieste.it
New Light Technologies (NLT) is on its way to attend Geo Week 2025 at the Denver Colorado Convention Center from February 10th through 12th. As an exhibitor, NLT will showcase its cutting-edge IMPACT platform—a robust tool designed to enhance disaster management and incident response. Visitors to Geo Week can find NLT at booth 507, where they’ll have the opportunity to explore the platform's capabilities firsthand.
30/01/2025-05/02/2025
[1] An interactive air quality map for Kigali, Rwanda. © Open Seneca | © Mapbox | Map data © OpenStreetMap Contributors.
traffic_sign:id=*
to explicitly reference official traffic sign identifiers, improving data accuracy and interoperability with external databases.The code is available on GitHub.
panoramax=*
and wikimedia_commons=*
tags. Also added are the display of users’ GPS tracks on the map and when opening notes from StreetComplete. The changesets history page now displays the first comment to a changeset. You can read about the other functions of the script in their other diary entries.Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.
This weeklyOSM was produced by MarcoR, PierZen, Raquel Dezidério Souto, Strubbl, TheSwavu, barefootstache, mcliquid.
We welcome link suggestions for the next issue via this form and look forward to your contributions.
La France est un des rares pays dans le monde à utiliser les relations associatedStreet. Utiliser les relations nécessite de mettre en place des règles de contrôles qualité pour assurer un fonctionnement optimal lors de la réutilisation des données par des outils tiers notamment les GPS.
Des contrôles existent déjà dans JOSM, Osmose et Pifomètre:
Je vous propose quelques requêtes Overpass qui vous permettront d’améliorer les relations associatedStreet dans votre région (les requêtes sont compatibles avec le chargement de données dans JOSM depuis Overpass Turbo):
[out:xml][timeout:25];
{{geocodeArea:Ain}}->.searchArea;
relation["type"="associatedstreet"](area.searchArea);
(._;>;);
out meta;
[out:xml][timeout:25];
{{geocodeArea:Ain}}->.searchArea;
relation["type"="associatedStreet"][!"name"](area.searchArea);
(._;>;);
out meta;
[out:xml][timeout:25];
{{geocodeArea:Ain}}->.searchArea;
relation["type"="associatedStreet"](if:count_by_role("strteet") > 0)(area.searchArea);
(._;>;);
out meta;
[out:xml][timeout:25];
{{geocodeArea:Ain}}->.searchArea;
relation["type"="associatedStreet"](if:count_by_role("") > 0)(area.searchArea);
(._;>;);
out meta;
Bon jardinage
I have been a member of OpenStreetMap US for five years, engaged in fostering open data and community collaboration. My first experience with OpenStreetMap was while working in emergency management in 2010 when the Haiti earthquake occurred. As someone trained in disaster response, I watched the events in Haiti unfold and looked for ways to contribute. It was also my gateway into other open data & software communities, such as QGIS and OSGeo, which informed my educational approach during my time at the University of Arizona Libraries.
I helped host the first MappingUSA virtual conference and, in 2022, hosted the State of the Map US conference at the University of Arizona in Tucson, marking the first in-person gathering post-COVID. I currently work at Development Seed, an organization with a long-standing commitment to OpenStreetMap. I drive our team’s community strategy by supporting OpenStreetMap events and engaging with other open geo communities. My background in geospatial technology, open data advocacy, and community engagement equips me with a unique perspective to support and strengthen OSM US.
OpenStreetMap is more than just a map—it is a platform for civic engagement, education, and open collaboration. As a board member, I would focus on:
Strengthening Educational Initiatives: While I was at the University of Arizona, I benefitted from the work of people who are part of TeachOSM and YouthMappers. I was the only geospatial specialist at the Library, where I taught workshops, developed learning materials, and consulted with researchers on geospatial data and tools. OSM is a fantastic resource for engaging students in geography, technology, and their local communities. I want to see OSM US build more tools and resources that make it easier for educators to bring OSM into the classroom and would work to implement a plan for growth in this area.
Expanding Government Engagement: OpenStreetMap is gaining traction among state and local governments for data use and contributions, particularly through the Public Domain Map initiative. I want to strengthen these partnerships to ensure OSM continues to serve as a critical public resource. I would work with the OSM US team to further develop the strategy around government engagement.
Connecting with the Global OSM Community: As one of the larger local OSM chapters, it is essential for OSM US to engage meaningfully with the global OpenStreetMap community. Strengthening these ties will allow us to learn from and contribute to the broader OSM ecosystem. I have existing connections to active contributors in Latin America and Europe.
Exploring Opportunities with Other Open Data Communities: I am particularly interested in how OSM US can collaborate with other open data initiatives to build a stronger, more resilient network. We can ensure mutual benefit, reinforce shared values, and create a more impactful open data ecosystem by working together.
My career has been dedicated to making geospatial data more accessible and actionable. As the Technical Communications Lead at Development Seed, I help translate complex geospatial technology and tools into information that empowers users. Previously, I worked in academia, leading geospatial literacy initiatives and organizing GIS-focused events to connect students, researchers, and practitioners. I have also worked in emergency management and disaster response, where I saw firsthand the importance of open data in crisis situations. My strengths lie at the intersection of technology, community, and communication. I love helping non-technical users engage with geospatial data and believe that OSM US should continue to invest in making OpenStreetMap as accessible and inclusive as possible. If elected to the board, I will advocate for OSM US to grow as a leader in open data, education, and government engagement, ensuring our map remains a dynamic and valuable public resource. I would be honored to serve and look forward to collaborating with the OSM community to build a stronger, more connected, and impactful OpenStreetMap US.
More about me on LinkedIn
Link do Video: https://www.youtube.com/watch?v=0eKytB2F28A
link para baixar o arquivo de limite de bairro, site do IBGE. https://www.ibge.gov.br/geociencias/downloads-geociencias.html?caminho=organizacao_do_territorio/malhas_territoriais/malhas_de_setores_censitarios__divisoes_intramunicipais/censo_2022/bairros/shp/UF
2023年から主に埼玉県をマッピングをしています大山かなめといいます。
はじめての投稿がこのような話題で気が引けるのですが、ある編集者がミスや誤情報の追加を複数回している場合、どのような対処があるでしょうか。
問題のある変更セットには議論機能で指摘を行っていますが、古いため通知が行かないのか反応されないことがしばしばあります。
メッセージを送る機能を使うのが最善でしょうか?
Hey folks! I’m Gregory Power (they/them) and I’ve been a part of the map since November 2023. I am currently a Data Scientist (Contractor) for Cary, North Carolina—where I manage Cary’s Open Data Portal and other analytics infrastructure. In my spare time I enjoy learning about urban planning and equitable, multimodal infrastructure. I’m also involved with the Pedestrian Working Group, Government Working Group, and my community’s Strong Towns chapter. I enjoy contributing to the open source geospatial software community, with a soft spot for GDAL, QGIS, GRASS GIS, and DuckDB—so everyone can have the tools to understand the world around them. Even though there’s a great set of tools for us to use, it’s nothing without having an open ecosystem of data.
My first project in OpenStreetMap was tracing plans for Cary’s Downtown Park into OpenStreetMap. With the updated layout, Cary’s Integration and Development Team could have a basemap to put our sensor data on. Once I realized OpenStreetMap data was used by all of our applications across the town and beyond, I was hooked. I enjoy mapping multimodal transportation infrastructure, handicapped parking, and restaurants. I’ve trained team members on conducting field surveys with StreetComplete and captured street imagery with Mapillary.
It’s important that communities have access to data and the ability to make changes to increase the fidelity of the data. These are the objectives I’d prioritize:
Empowering others to find solutions to their problems fills my cup. OpenStreetMap provides an ecosystem where we can form mutually beneficial relationships. If anyone has any questions or comments, I’d enjoy hearing from you.
Tell Us About Yourself My name is Kseniia. Right now, I am a student in the International Cartography Master program, but before starting my studies, I worked for many years as an analyst in the field of urban and transport planning. I think it’s because this field often involves working with different barriers (physical, social […]
The post Maps and Mappers 2025 – January – Kseniia Nifontova appeared first on GeoHipster.
Due to some issues I’m running behind just a bit – so enjoy this map from December of 2024. Tell Us About Yourself I am a Senior Cartographer at National Geographic Maps, own my own freelance business, Tombolo Maps & Design, and have been making maps professionally for 15 years. My maps have been in more […]
The post Maps and Mappers 2024 – December – Aly Degraff Ollivierre appeared first on GeoHipster.
The number of OSM-using companies joining the OSM Foundation Corporate Membership program has increased significantly in recent months.
In all, eight new companies joined and five increased their giving levels in 2024.
>> Read more about becoming a Corporate Member of OpenStreetMap
Starting from the top: Long-time OSM supporters, ESRI, Meta and Microsoft have joined TomTom at the Platinum giving level. The Platinum tier is suggested for companies for which map applications are core to their business; and/or they have a product that depends on OSM data and/or revenue in the hundreds of millions.
>> Read more about Meta’s recent generous donation and partnership
Five new names now appear as supporters at the Silver giving level: global gaming and AR company Niantic, QGIS, calimoto, Mapy.cz, and ioki. Silver is recommended for companies who use OSM data in a product or service and have revenue in the millions.
“OpenStreetMap Foundation’s community-driven approach helps keep the map of the world as accurate as possible. As map lovers and builders ourselves, we are excited to help support the OSMF mission” – Yennie Solheim, Niantic Director of Social Impact
E–Smart, which specializes in dynamic speed management, and LANDCLAN, offering location intelligence data and tools, are new joiners at the Bronze level. Interline, a transportation network consultancy, and Verso, a maker of wearable technology both upgraded their membership from Supporter to Bronze.
And Infrageomatics, offering location intelligence derived from open source infrastructure data, has joined as new Supporting member.
The success of OpenStreetMap depends on organizations that make financial contributions, donations “in kind” such as hosting services and other resources, and hardware. These philanthropic investments help ensure site stability, support the maintenance of technical infrastructure, and help sustain OSM’s volunteer community.
The OSM community and the OSMF are extraordinarily grateful for the sustaining contributions of Corporate Members.
The Bhuvan portal, developed by the Indian Space Research Organization’s National Remote Sensing Centre, is powered by OGC Standards and caters to 150,000 unique users per day, achieving an impressive 20 million hits daily.
The post Bhuvan: Transforming India’s Governance with Geospatial Insights appeared first on Open Geospatial Consortium.
This was harder than I thought it would be. The lesson for re-drawing a road was drag-and-drop. I could not insert an extra node in the Administrative Boundary in order to truncate the triangle.
I had to create a new Administrative Boundary and relate it to the existing boundary. Then delete the node at the top of the old boundary.
Requested review as I’m not sure that is the correct or accurate way to perform the task.
Covering an area of about 400 square miles, Tampa Bay is Florida's largest open water estuary.
The post Florida’s Largest Open Water Estuary appeared first on Geography Realm.
We are happy to announce the release of Nominatim 5.0.0. This major release marks the end of a 4-year journey to modernize and modularize the Nominatim codebase in order to make it easier to use and maintain.
This release finishes the mutation of Nominatim into a Python package. The PHP frontend, bundled osm2pgsql and cmake build scripts have now been removed for good. If you are still using one of these features, then you should update your software to Nominatim 4.5 and then move to the new Python frontend and pip installation. Once done, you can easily update to the latest version 5 release.
Also in this release, the osm2pgsql import style configuration has been largely be rewritten. If you are using one of the built-in styles, this will not make much of a difference. If you are maintaining your own custom style, however, this should become much easier. Most notable, it is now possible to start with one of the existing styles and add your customizations on top. That should make it much easier to keep in sync with the latest changes in Nominatim. Have a look at the updated documentation for details. The new implementation is largely backwards compatible, so your old scripts will keep working for now.
With the new osm2pgsql style implementation comes the ability to use Nominatim together with osm2pgsql-themepark. This comes in handy when you want to combine Nominatim with other osm2pgsql flex styles in order to host OSM data for different purposes in the same database. Check out the updated cookbook about how to run Nominatim with osm-carto to learn how to use this feature.
Finally, Nominatim has a new hook for adding pre-processing functions for incoming search queries, allowing to apply custom filtering. The first filter to use this new functionality breaks up Japanese addresses into their parts.
A full list of changes can as always be found in the Changelog.
The rationale behind this was that the COVID-19 pandemic has led to diverse experiences influenced by public health measures like lockdowns and social distancing. To explore these dynamics, we introduce a novel ’big-thick’ data approach that integrates extensive U.S. newspaper data with detailed interviews. By employing natural language processing (NLP) and geoparsing techniques, we identify key topics related to the pandemic and vaccinations both in newspapers and personal narratives from interviews, and compare the (spatial) convergences and divergences between them.
Abstract:
In the face of the unprecedented COVID-19 pandemic, various government-led initiatives and individual actions (e.g., lockdowns, social distancing, and masking) have resulted in diverse pandemic experiences. This study aims to explore these varied experiences to inform more proactive responses for future public health crises. Employing a novel “big-thick” data approach, we analyze and compare key pandemic-related topics that have been disseminated to the public through newspapers with those collected from the public via interviews. Specifically, we utilized 82,533 U.S. newspaper articles from January 2020 to December 2021 and supplemented this “big” dataset with “thick” data from interviews and focus groups for topic modeling. Identified key topics were contextualized, compared and visualized at different scales to reveal areas of convergence and divergence. We found seven key topics from the “big” newspaper dataset, providing a macro-level view that covers public health, policies and economics. Conversely, three divergent topics were derived from the “thick” interview data, offering a micro-level view that focuses more on individuals’ experiences, emotions and concerns. A notable finding is the public’s concern about the reliability of news information, suggesting the need for further investigation on the impacts of mass media in shaping the public’s perception and behavior. Overall, by exploring the convergence and divergence in identified topics, our study offers new insights into the complex impacts of the pandemic and enhances our understanding of key issues both disseminated to and resonating with the public, paving the way for further health communication and policy-making.
![]() |
An overview of the research workflow. |
The monthly distribution of collected articles in the United States from January 2020 to December 2021. |
An example of identified entities labeled with predefined entity types. |
The spatial distribution of newspaper articles by different scales. |
The spatial distribution of identified newspaper topics across different regions in New York State. |
Ordered rank of identified topics by percentage from interviews. |
Chen, Q., Crooks, A.T., Sullivan, A.J., Surtees, J.A. and Tumiel-Berhalter, L. (2025). From Print to Perspective: A mixed-method analysis of the convergence and divergence of COVID-19 topics in newspapers and interviews, PLOS Digital Health. Available at https://doi.org/10.1371/journal.pdig.0000736. (pdf)
Hello!
I’ve just found out that ~2k out of ~13M relations have no type
tag, ~220 of which have the type
tag with lifecycle prefix or date namespace suffix, so I’ve decided to review them manually, with the ones with no tags containing the word “type” being the first, correct some mistakes (e.g. missing type=multipolygon
) and delete the unnecessary relations (e.g. with duplicate tags). In case I make a mistake, please point it out on OSM Community topic or comment my changesets.
Regards
Kamil Kalata
In the February 2025 edition of our interview series with OpenStreetMap communities around the world, we speak with open source geospatial software and OpenStreetMap veteran Just van den Broecke about the OpenStreetMap Alpumapa Workshop.
1. Who are you and what do you do? What got you into OpenStreetMap?
Just van den Broecke from The Netherlands, a long-time open source geospatial professional. I wrangle geospatial data, a.k.a. ETL, work on geospatial web services like OGC REST APIs, and make maps like map5topo.nl, a topographic map of The Netherlands.
Registered my OSM account ‘justb’ in 2005. Was then mapping inline skate-routes and contributed to hiking for years. Though I was not a very active mapper, I recently revived, also through this project and joining OSM-ES.
2. What is “Mapas y Tapas” in Alpujarra and why did you create it? Who participates?
“Mapas y Tapas” is a free Spanish translation of “Mapping Party”. We are a group, now mainly around Ugíjar, that started from a few workshops. The idea is getting together in any Alpujarra village bar, go out mapping, the “Mapas”. Then reconvene at the bar, discussing results with drinks and “Tapas” (free with every drink in Andalusia).
Why? Several reasons: the Alpujarra or “Las Alpujarras”, a hidden gem in Spain!, is sparsely mapped in OpenStreetMap. Municipalities are attracting digital nomads as one way to counter depopulation. For this, Alpujarra Knowmads was established. Then at least the area needs to be mapped, in particular amenities (bars, shops, opening times…) . Also the well-known commercial map providers have sparse and outdated maps. Another reason is that just after I joined OSM-ES (Telegram) in oct 2024, the DANA storm in Valencia broke out. It appeared that most of the small villages, “pueblos”, around Valencia were poorly mapped, hindering navigation for aid workers. Quickly the OSM-ES community established a HOT project where 100s of mappers worldwide mapped the areas the best they could. DANA hit Málaga and mostly Valencia, the Alpujarra is in between, was lucky this time…And more recently: OSM is a perfect alternative for “Big Tech” maps. I try to convince folks, also non-mappers, to use OSM-based apps like Organic Maps.
Basically anyone participates, young & old, both local Spanish and what we could call immigrants or expats, mostly from Northern Europe, like UK, Denmark, The Netherlands.
3. What are the unique challenges and pleasures of mapping in this region of Spain? What aspects of the projects should the rest of the world be aware of?
There are many pleasures first: being outdoor in magnificent landscapes and lovely, mostly former Moorish, whitewashed villages. Also the fact that there is so much to map makes, let’s say mapping amenities with EveryDoor, you feel accomplishing a lot. I showed before/after maps and results are staggering. Also we have some remote Spanish “armchair mappers” helping out. Also the support from the Spanish OSM community is warm.
Challenges? Well, speaking a bit of Spanish helps. Also convincing older local people. Some areas are really remote, hard to drive with regular cars. And off course: beware of getting burnt from the sun! And please: use the Buildings and Addresses Import Spain procedures to import, i.s.o. drawing buildings on aerial imagery, however tempting. Spanish Cadastre provides open (INSPIRE) data!
The greatest challenge is marketing and communication: how/where to announce, how to keep in touch. Within the region Facebook and Instagram are still a major communication channel for villages. Challenge is also to explain what OSM is about, why it matters for the region. Communication is also often with leaflets and cars with speakers driving around.
4. What have you learned? What is the best way for people to do something similar in their region?
Start super-simple: only with the EveryDoor app. Maybe StreetComplete, though Android-only. Leave Id, JOSM etc for much later. Advantage is: no need to bring a laptop. Also let them install Organic Maps (OM) and do some navigation, share a GPX file and hike together. Record the track with OM.
Also hold regular meetups, maybe once a week, on a fixed time/place, it is very hard to remain momentum. There are many distractions here as the area is also well-known for its informal social networks and every week there is some “fiesta”. But to be honest, “distractions”, is not the right word: the local communities of both Spanish and non-Spanish blend well together, giving experience to a warm and richer social life than many of the big cities and coastal towns.
5. What steps could the global OpenStreetMap community take to help support local mapping like this?
Somehow to have a platform where smaller groups can gather, and where announcements can be made. The OSM-ES Telegram (OSM Forum is hardly used in Spain) group is very welcoming but overwhelming in messages in a single thread, especially to newcomers.
6. Last year OpenStreetMap celebrated 20 years. As someone who has been very active in OSM in many ways for a long time, where do you think the project will be in another 20 years?
That is a very tough question. What we now see is the advance of AI, often uncontrolled. For good and bad this will influence the way we will be mapping. Also the expanding effect of Overture Maps may be beneficial for OSM, but at the same time we need to hold on that OSM remains the central source/project. The last few months, at least within The Netherlands, there is an enormous movement to “Big Tech Alternatives”, like X to Mastodon etc. In this OSM is always touted as the alternative for the well-known proprietary providers with apps like Organic Maps, OSMAnd, MagicEarth. If folks figure out: “hey I can add stuff myself”, that may increase the number of mappers. At the same time, we already experience this in Open Source GitHub repos, often AI-driven “vandalism” should be guarded for. But overall, if the OSM Community sticks together, I foresee a bright future ahead!
Thank you, Just! Great work with the mapping and with the community building. Stay up to date on the project and Just by following him on Mastodon, or of course on the Alpumapa site.
Happy mapping,
Please let us know if your community would like to be part of our interview series here on our blog. If you are or know of someone we should interview, please get in touch, we’re always looking to promote people doing interesting things with open geo data.
Cloud-Native Geospatial represents a significant shift in how geospatial data is processed, stored, and analyzed. This approach offers GIS Professionals greater scalability, allowing them to handle massive datasets without relying on traditional and often limited on-premise infrastructure. Additionally, the cloud-native approach enhances collaboration by enabling multiple users to access and work on shared datasets in real-time, regardless of their physical location, helping to eliminate data silos. This level of accessibility and flexibility empowers GIS professionals to deliver faster results, streamline workflows, and adapt to the growing demands of modern geospatial applications.
Cloud-native geospatial refers to the practice of leveraging cloud-based technologies and architectures to handle geospatial data in the cloud, ideally without migrating it between heavy/purpose-built storage and file formats. This approach focuses on scalability, flexibility, and integration with modern cloud ecosystems to meet the growing demands for processing and analyzing spatial data. By adopting cloud-native principles, geospatial applications can take advantage of distributed computing, serverless architectures, high-capacity storage, and managed services offered by cloud providers, reducing operational overhead while improving performance.
Cloud-native geospatial enables the efficient use of large datasets by providing direct access to the section of the data you need without expensive clip operations. Additionally, complex geospatial processes can take advantage of distributed computing architectures, reducing the linear nature of traditional GIS workflows.
One prominent example of a cloud-native geospatial data format is Cloud Optimized GeoTIFF (COG). COGs are specifically designed for efficient access and use in a cloud environment, allowing users to retrieve only the portions of data they need rather than downloading entire files. This makes them ideal for handling large raster datasets, such as satellite imagery or digital elevation models.
Another widely adopted format is Zarr, which is commonly used for multidimensional array data. Zarr enables parallel and random-access data reading, making it particularly suitable for large-scale climate and weather datasets. Combined with cloud storage, Zarr allows researchers and professionals to collaborate on complex analyses without managing extensive, localized data copies. Parquet and GeoParquet are examples of cloud-native formats that simplify working with tabular geospatial data. Column-based storage - instead of the row-based approach of traditional GIS file formats - provides efficient compression, and these formats are well-suited for fast access and analysis of large vector datasets, including geometries and attribute tables. These formats, among others, help to optimize geospatial workflows for cloud architectures, promoting efficiency, interoperability, and innovation.
Integrating cloud-native geospatial formats into your GIS solution starts with selecting the appropriate tools and frameworks that align with your needs. Fortunately, many widely-used GIS platforms like ArcGIS and QGIS already support cloud-native formats because core geospatial libraries such as GDAL, geopandas, and R’s raster package support COG, Zarr, and GeoParquet. As support for cloud-native formats continues to grow, more GIS platforms will likely adopt these formats, allowing seamless integration of cloud-based data into your workflows, enabling you to take full advantage of the scalability and accessibility provided by cloud storage.
For example, when working with COGs, GDAL can access only the required portions of the images for your workflow rather than downloading full images, significantly reducing bandwidth and computation costs. Similarly, tools like Zarr-Python or xarray are excellent options for handling Zarr-formatted multidimensional datasets, offering powerful data analysis and visualization capabilities in cloud-centric environments. Zarr is also supported as a multi-dimensional raster format in ArcGIS.
Another critical step is configuring your GIS infrastructure to use cloud storage services effectively, such as Amazon S3, Google Cloud Storage, or Azure Blob Storage. These platforms offer APIs and SDKs for streamlined integration, making it easier to incorporate them into geospatial workflows to manage and process large datasets in real-time. Using cloud-based object storage enables the use of compute resources in the cloud, which can decentralize your geospatial workflows and offload complex processing that can tax traditional desktop and server resources.
Adopting these practices enables you to fully leverage the capabilities of cloud-native geospatial formats, ensuring that your GIS solution remains well-positioned to meet the dynamic requirements of modern geospatial applications.
Several database technologies are well-suited for supporting cloud-native geospatial formats, providing efficient querying, storage, and spatial data analysis. PostgreSQL with the PostGIS extension is a widely recognized solution, offering robust support for geospatial data types, functions, and compatibility with formats such as GeoJSON and WKT. While GeoPackage can be imported and exported using tools integrated with PostGIS, full compatibility may require additional processing. The Crunchy Bridge for Analytics also enables direct access to cloud-native formats using PostgreSQL.
Cloud-native query platforms such as Amazon Aurora (PostgreSQL-compatible), Google BigQuery, Snowflake, and Azure Cosmos DB deliver scalable solutions for analyzing geospatial data, often integrating seamlessly with cloud storage systems. These platforms offer native support for specific geospatial data types and functions, making them particularly suitable for modern geospatial applications.
For large-scale geospatial data processing, joins, and operations in distributed environments, technologies like Apache Spark, with geospatial add-ons such as Apache Sedona (formerly GeoSpark), are highly effective. Apache Sedona enables parallel processing and advanced spatial analysis, ensuring 10X higher performance for complex queries. Similarly, cloud-native data warehouses or lakehouses such as Databricks (which uses Apache Spark) provide geospatial SQL capabilities, facilitating the analysis of formats like WKT and WKB alongside traditional datasets. The team behind Apache Sedona also offers Wherobots for serverless data warehouse/lakehouse compute built with a distributed computing architecture (Spark+Sedona) for highly optimized geospatial data workloads. DuckDB, while optimized for single-node analytics, also supports geospatial extensions and is ideal for high-performance local analysis.
Selecting the appropriate database technology depends on the application’s specific requirements, including data volume, query complexity, and whether real-time or batch processing is needed. Organizations can build a robust and future-ready foundation for modern geospatial applications by integrating scalable database technologies with cloud-native geospatial formats.
Machine learning (ML) and artificial intelligence (AI) are integral components of the cloud-native geospatial ecosystem. Frameworks such as PyTorch (most popular) and TensorFlow are widely used for geospatial applications, enabling tasks such as predictive modeling, object detection, and land cover classification. These capabilities can provide accurate automated insights, supporting applications in fields like precision agriculture, disaster response, and climate change mitigation.
The diversity and volume of geospatial data products and GeoAI models derived from them can make it challenging to find relevant data products and reproduce model predictions. To address this problem, the cloud-native geospatial community is defining cloud-native data formats and taking the next step to create cloud-native standards for data catalogs and collections. These standards make it easier to search and associate geospatial data and models. The Spatio-Temporal Asset Catalog (STAC) specification is an industry standard for describing satellite imagery, aerial imagery, lidar, and other types of geospatial data. This standard includes extensions for describing different kinds of assets, including the Machine Learning Model (MLM) specification. The MLM provides searchable metadata that links model artifact files, model input requirements, hardware and runtime requirements, and associations to published STAC datasets to make it easy for machine learning frameworks to reproduce model inference and to make models searchable in model catalogs. You can learn more about the STAC ecosystem of standard and tooling at https://stacspec.org/en.
Cloud-native infrastructure is crucial in advancing GeoAI by enabling scalable and efficient data processing. Platforms such as AWS SageMaker, Google Cloud AI, and Azure Machine Learning facilitate seamless integration of ML workflows with geospatial data. Additionally, cloud-native geospatial formats, such as Cloud Optimized GeoTIFF (COG) and Zarr, ensure fast and efficient access to large geospatial datasets, reducing the latency associated with traditional data handling, especially when stored in cloud-native object storage.
Beyond predictive modeling and classification, GeoAI is relevant to a wide range of analytical capabilities, including spatial clustering, route optimization, spatial interpolation, viewshed analysis, hot spot analysis, and spatial regression. These capabilities are further enhanced by the parallel processing power and distributed architecture offered by cloud-native technologies, making it possible to handle complex and large-scale geospatial data and workloads efficiently.
Organizations can build robust GeoAI solutions that address diverse and evolving geospatial challenges by leveraging cloud-native geospatial technologies and ML frameworks3.
Cloud-Native Geospatial represents a transformative approach to managing, analyzing, and sharing geospatial data. Geospatial practitioners can achieve greater scalability, efficiency, and collaboration with cloud-based storage, computing, and modern data formats such as Cloud Optimized GeoTIFF (COG), Zarr, and GeoParquet. Integrating cloud-native geospatial formats into GIS workflows helps organizations remain flexible and capable of handling the growing demands of modern geospatial applications. With the right tools, infrastructure, and database technologies, cloud-native geospatial enables organizations to streamline operations and drive innovation in the geospatial field.
You can learn about cloud-native geospatial by exploring the Guide to Cloud-Optimized Geospatial Formats. Discover the essentials of cloud-native geo data formats and standards by watching the CNG 101 webinar—an excellent resource for an overview of CNG.
Want to get involved? Join the Cloud-Native Geospatial Forum (CNG) community and become part of the team pushing the envelope to bring cloud-native standards and technology to geospatial!
We are happy to announce that osm2pgsql maintainer Sarah has been selected as one of the fellows for the one-year pilot of the Sovereign Tech Fellowship programme of the Sovereign Tech Agency. The programme recognises and financially supports the significant amount of time that goes into the daily maintenance of an open-source project. Read more about the fellowship in Sarah’s post over at nominatim.org.
I’m happy to announce that I have been selected for the one-year pilot of the Sovereign Tech Fellowship programme of the Sovereign Tech Agency. The fellowship programme will support maintenance of Nominatim, Photon, osm2pgsql and pyosmium over the next year.
Participating in an open-source software project like Nominatim or Photon is not just about the implementation of fancy new features or clever algorithms to improve performance or the user experience. A lot of work happens quietly behind the scene: user questions need to be answered and bug reports followed up. Dependent software needs to be monitored and updated as necessary. The own code needs to be reviewed and polished regularly to prevent it from ageing and slowly falling apart. CI pipelines are a great tool for a maintainer but they do break with an astonishing regularity and therefore need regular attention. Not to mention that a CI is useless without a set of well-maintained tests.
With its new Sovereign Tech Fellowship programme, the Sovereign Tech Agency recognises the importance of this maintenance work for the general functioning of the open source ecosystem. The programme will financially support my day-to-day tasks of software maintainership: responding to issues on Github, reviewing and merging pull requests, fixing reported bugs and addressing security issues, improving documentation and tests, preparing releases etc. On top of that there are some other not so glorious maintenance tasks that are planned for this year.
Nominatim’s import module relies on a datrie library, which has been unmaintained for some years and no longer compiles with the newest GCC compilers. We need to find a solution to that by either switching to a new library or taking over maintenance. A similar fate is likely in store for the testing library behave. With the latest stable release from 2018 and very few activity since, it is unclear how long it will remain functional with new versions of Python.
Photon has seen the move to OpenSearch 8 last year. This transition is far from finished. For example, there is still no proper support for using an external instance of OpenSearch instead of the embedded one. And ElasticSearch/OpenSearch itself has also seen some improvements in the last three versions we have skipped. It’s well worth investigating how geocoding can benefit from them.
These will, of course, not be the only things happening in 2025 for Nominatim and Photon. There will also be shiny new features and I have some ideas for improving performance and how to better handle the increasing complexity of OSM data. However, none of this can really happen, if the basis isn’t there and the general maintenance isn’t cared for.
Many thanks to the Sovereign Tech Agency for this great opportunity and the recognition of the importance of maintenance work for software.
If you want to support development and maintenance of Nominatim, too, please consider becoming a Github sponsor.
Panoramax is a rising star at the moment, for OSM, but not only; while Mapillary goes down, sold to Facebook. In my old www.OSMgo.org, the key P showed a Mapillary picture near the actual position in the 3D rendered OSM world. As the API is gone, I replaced it with Panoramax. OSMgo was abandoned by me years ago but the server is still running. As I read more and more about Panoramax, I decided to dig out the old code and use it. First I asked in the Fediverse for help with the API and got a great and fast response, even a good example, thank you all!
There are almost to much pages about it. The real API was a bit hidden but well documented at last. The Idea of decentralised servers but a central directory is great, the Web-UI to see all pictures to. And the API got me a json list of the closest pictures, including a link to the pictures. After researching to define the radius (place_distance) and get the direction of the “shot” (feature.properties[“view:azimuth”]) my old Mapillary could be modified to show the picture in the 3D view and move the camera to see it.
You may note, the controls of OSMgo are nasty. The whole project is outdated and incomplete. I consider to redo it, may be in Rust or Zig; someone interested to join in? One missing feature in OSMgo is roof-types like gabled. There are more OSM 3D renderers and I started to investigate how they solve my obstacles. A compact one is this building viewer Example: US Capitol
Before I will start with a new code, I want to have a detailed definition of how 3D rendering is done in the open source projects: OSMBuilding, OSM2World, plugins for JOSM and Blender. Do you know more There are some points to talk about in the OSM forum, like how to render a dome or an onion, circular or according to the way nodes it is tagged in. The OSM Wiki may get more details/pages. With this knowledge, OSMgo could get some improvements to.
A rust crate may be the next goal. It would take an area of OSM data and create a 3D scene, rather simple or good looking will be controlled by options. That Scene could be send directly to the GPU or stored in GLT-file
Cartopareidolia is the phenomenon of seeing people and animals in maps.
The post Cartopareidolia: Seeing People and Animals in Maps appeared first on Geography Realm.
Following up on my last post, I wanted to share some more details about the experience of using AI tools to code a plugin for QGIS, one that has seen some reasonable success, with over 2000 downloads in the past couple of months. My hope is to inspire others to make their own QGIS plugins and other geospatial tools, as I think more people doing AI-assisted coding has the potential to accelerate the momentum of the open source ecosystem.
Cursor & QGIS — awesome together :)
Before we dig in I want to give everyone who is not a coder some encouragement to jump in and try things out. The quick answer is yes! You can code a QGIS plug-in even if you’re not a software developer. I’m sure you’ve seen the videos of people building cool things with AI tools, but it can still be hard to actually dive into it. For me the most important thing is to have a real problem you’re trying to solve. I could never follow those tutorials about ‘making a e-commerce store’ since I just don’t care about making an e-commerce store. I could follow the instructions, and get a generic thing, but I wouldn’t actually learn much. But when I’m trying to solve something specific that I care about it becomes much easier, because I really want the result.
I’m guessing you have some interest in geospatial in general and likely QGIS specifically if you’re reading this post. So the top thing I’d encourage you to do is to think about something you’d like QGIS to do that it doesn’t do for you today. This could just be a common workflow that you do all the time, or it could be some cool new functionality you always wish it had. And if you want to start even easier than just think about some processing of files that doesn’t even use QGIS — doing a basic python program that processes geospatial data with GDAL/OGR or GeoPandas can be even easier than a QGIS plugin.
After I made the GeoParquet downloader QGIS plug-in I had one of the best developers I know reply with a post on bluesky:
I responded that 99.5% was AI-coded — so Matthias still has to believe me when I downplay my coding skills :) I don’t actually write the code, I just instruct the AI what I want it to do. I am learning more, and at this point I perhaps could write more of the code, but I’d rather just get it right the first time than try to memorize the syntax or struggle when I mistype something.
Now, I am not without any coding experience, and I’ll share my background a bit below, so I’m not yet ready to say ‘anyone can make a QGIS plugin’. I do think there is a decent chance that the process of breaking problems down and iterating through debugging is likely a skill I retain that may not come instantly to someone with no coding experience. But I do think if you have a problem you want to solve and you are resilient then you can just use the chat interface to teach you everything you need to know — you just need to keep asking and aim to really understand it. No matter what you’ll learn something, and if it doesn’t work out today I’m confident that it won’t be long until anyone who is motivated can do it.
Since I started writing this article I did get a great proof point on how easy it can be:
I wanted to share a bit more about my background, and how I have recently started programming again thanks to the power of AI tools. I started my career coding, serving as the first lead developer of GeoServer for a couple of years in 2002, learning a ton from a number of great early members of the GeoTools community — shout outs to Ian Schneider, Gabriel Roldan and Andrea Aime! But after less than two years I realized they were all better software engineers than me, so I (eventually) recruited all of them to work on GeoServer at OpenGeo, and I focused on community building (the fun part) and bringing in money to support the software (the less fun part, but a really essential one).
I found it really hard to do ‘both’, as coding was too fun and satisfying — I know of few other things you can get paid for where the day just flies by. So I cut myself from coding in order to figure out all the other things needed to turn a growing open source project into a successful ‘business’. It ended up being a lot of fun, and I learned a ton, but I did miss the act of creating software. After a few years, when I felt confident in things like ‘product management’, ‘business development’ and ‘managing’, I’d periodically try to code and it’d just be too frustrating. It’d take a couple hours just to get a few lines of barely working code down, as I’d first need to get my coding environment all set up, and then would struggle to remember basic syntax and would need to look up almost every call.
When ChatGPT came out it had been over 20 years since I’d seriously programmed, and I had under two years of total experience coding professionally. My first couple of attempts to use ChatGPT to code didn’t quite work, as it’d just get too much wrong to be worth it. But sometime around GPT-3 I wanted to explore Google Open Buildings & GeoParquet and had my first success. As long as I gave it small, constrained tasks to process the data with GDAL/OGR or GeoPandas it’d do amazing. 75% of the time it’d give me a perfect result, 20% it’d get a bug but you could feed it the error and in 2–3 iterations it’d fix it. And 5% of the time it’d get stuck in a loop, trying to fix things but going back to the previous bad way, which was frustrating. But overall it felt ‘worth it’ and I was able to make far faster progress using it than not. Since then I’ve been taking on more ambitious projects as the LLM’s and tooling around them has improved, and this year I’m aiming to spend at least 50% of my time doing AI-assisted software development.
Before I dig into my experiences building the plugin I first want to share a bit about what tools are working for me, and some ideas and recommendations for how to think about what to use. The first thing I will say is that if you’re coding you should absolutely pay instead of using the free versions. If you’re less experienced with coding then it’s harder to work through things when the LLM doesn’t get things right. And the latest models that you get from paying absolutely get things right more often. It’s hard to put a number on how much better they are, but my feeling is that even if it’s only 10–20% better then it can easily save you hours of frustration, and that having that frustration when you’re getting started can easily turn you off from pushing further. And in my experience it’s more than 20% better.
My primary tool for these types of projects is Cursor. I only discovered it after a number of months successfully coding with ChatGPT, but a number of great coders I work with said it was amazing, so I thought I’d give it a shot. I believe it’s the fastest I’ve gone from starting a free trial to deciding ‘yup, I’m absolutely going to pay for this’ — it was maybe 15–20 minutes. The key feature that moved me to buy is actually different than the one more advanced coders love. It’s the ability to generate ‘diffs’ on the code that is generated by the LLM.
Diffs in Cursor for the latest improvements for my plugin
You can see in the right side panel the code that gets generated by an LLM. It’s totally fine if you just take that code and use it, but if there’s a problem or if you want to add something to the code then it gets challenging to use the whole set of code. You either have to copy and paste the entire file each time, or you have to successfully spot every difference and copy over each line right. And in this example the LLM isn’t even generating the full file, you can see it says // ... existing code ...
so when that happens you’d need to figure out the right place to insert your new code. Before cursor I had that go wrong enough times that I’d ask the LLM to always generate the entire code, and then would paste the whole thing in each time. But then it’d get unweildly if you had a larger file, and also would slow things down since it’d spend a lot of time reprinting things.
This ability to apply diffs was an absolute game changer in my productivity — you can have confidence that every difference between the new code and the current code is addressed. Now, this isn’t to say that the diffs are always great. Sometimes it messes things up, or redoes things that you don’t want it to redo. But the UI of cursor is such that you can go through each of the diffs and say hit ‘y’ or ’n’ to accept it.
I often will scan each diff to make sure it’s doing what I thought it would. Though often I’ll just apply the whole thing, run the file, see the results, and then just go back to examine the diffs if things didn’t work.
The other killer feature of Cursor, that expert coders love, is the ability to give the LLM the context of your entire codebase, not just the file you’re looking at. The QGIS plugin I made is a single file, so for this project that feature doesn’t matter much, but for my geoparquet-tools project I’m trying to do some better programming practices of splitting things up, and it’s been great for that.
Cursor lets you select from different models, and it’s got access to most of the latest. It doesn’t have the most expensive ones, like o1, and you have some limit for how many calls you get to the latest. But you can enter an API key for your own LLM account if you want to have it use truly the latest models. I actually switched for this project to Claude Sonnet 3.5, and it’s now my go to within Cursor. I can’t say that it’s definitely better — it’s just that 4o was frustrating me and getting things wrong, and when I tried switching the model to Claude it nailed it, and I’ve just stuck with it since. The ‘best’ here is constantly changing, so it’s worth checking out leaderboards (like aider.chat or https://livebench.ai).
And I also will use ChatGPT Plus occasionally. I don’t think it’s essential — if you want to stick with one tool I’d go with Cursor (or Windsurf, which Evan found great success with and it may be even better for those with little experience coding). You can use it’s chat interface to ask questions, just like you would with ChatGPT — you don’t have to have it give you code responses, you can just ask for background, how to do things, etc. For coding stuff I tend to use the o1 model, and I mostly just use it for things were the other models just are looped in bad answers (more about that later). And it can be useful for planning things out and suggesting an overall approach to things.
The explosion of useful tools in this space is incredible, so I imagine there will be some new tool set that’s even better before too long. I plan to re-evaluate every six months or so, but encourage you to just try things out. Just do please pay for one, and you’ll likely have a better experience. Perhaps in a couple years maybe the free tools will be more than sufficient, but that’s not the case right now.
To be honest my successful coding of this plugin was a bit of a lark. As I mentioned in the previous post I’d been intimidated by QGIS coding — I think I looked into it once before and it was just too many things to learn. So I really wasn’t expecting it to work, as my AI successes in the past had all been on very discrete things, and when I tried to get too ambitious it’d get challenging. But I just asked it to make a plugin, and then asked how I run it to try it out, and it started working.
The key is to never ask for too much — always start small, and just ask for one more thing each time. Sometimes I’d try to ask for many things at once, and it’d get some of them, but then it was a lot more difficult to try to fix the one thing that was wrong, since too many things were introduced at once. So it’s best to just keep asking for tiny little improvements.
To start I just asked for a plugin that popped up a dialog to make sure that worked. I had to ask how to actually install it — you just zip it up and then you can ‘install zip file’ from the extensions folder, or you can copy it directly into the plugins folder. Then I just added more and more — first I hardcoded a specific Overture file to download, and got it so DuckDB ran the query to get it. I had experience with DuckDB so I knew what I wanted from the query, but LLM’s are really great at SQL, so it’s easy to have it form the query for you.
One thing I have learned is that the LLM’s do much, much better on well-established, well-documented tools. SQL is just incredible — I like SQL, but was never great at it, and now I feel like it’s a super power. I can just ask the LLM for all kinds of crazy analysis and it’ll make these complicated joins that do exactly what I want. GDAL/OGR and GeoPandas are both quite good. But often newer features aren’t as solid, as the LLM may have been trained before they became widely documented and used. But you can instruct it about them, usually I just paste the documentation in directly. So if you want to use some totally obscure tool then it can struggle — though always with a positive helpful and confident attitude, it just makes up whatever you want.
But I was pleasantly surprised to find the QGIS plugins is something they all know well. Which makes a lot of sense, as there is a very large ecosystem, so lots of code and documentation for it to learn from. Through out the process it’d come through with good answers to things that I thought it might struggle with. One example was that it become clear that my plugin was taking over the whole program, and QGIS would just stop doing anything else. So I just asked if there was a way to make it so that didn’t happen, and it said that I could run worker threads and then came up with all the code needed for that. I also wanted to open multiple dialogs, links you could click on, and reporting errors out in different ways, and it did all that well. Overall it handled most everything QGIS related with ease.
My plugin, now available to anyone directly through their QGIS!
I will admit that I definitely do not understand most of what’s going on. With my smaller python programs I’d always have a good sense of what was happening in most lines, but the structure of these is more complicated. But the cool thing is that you can just highlight code and ask it to explain it if you want to understand more.
My biggest struggles came with getting the dependencies right. DuckDB is essential for everything to work, but it’s not installed in QGIS by default. The QDuckDB plugin did an amazing job packaging things for Windows, so for those users I recommended that they just install QDuckDB, and then I linked to their instructions for other users. For the next release we tried to make it so the plugin would automatically install DuckDB using PIP, and that was definitely my single biggest struggle. Matt Travis, the first outside contributor to the project, coded what looked like a great way to do it. But then it didn’t work on one operating system, so I tried another way, and that didn’t work on mine, and it was just a pain. I’m still not sure what is the ‘ideal’ route for QGIS plugins to get dependencies in, it seems to be that users are asked to manually get their python environments updated.
I would occasionally have the LLM get stuck in a frustrating loop. Usually it’d get the code right immediately, but about 15% of the time it’d generate an error. But the awesome thing is that usually you can just paste in the error that results back into the chat and it’ll realize what is wrong and fix it. But occasionally it’ll give you a fix that won’t work, and you paste the error back in and then its next fix will suggest something different that also doesn’t work. And when you paste those results back in it’ll give you the first fix again. And then it’ll just loop between the two things, getting it wrong each time. I’ve found the best thing to do in this situation is to try another LLM. Cursor makes it easy to just swap in a different model, going between Claude and OpenAI. These days I usually just jump to o1 through ChatGPT Plus, as it very often gets it right — I just paste in my full code and the problem (and right before publishing: o3-mini-high seems even better, and can be called directly from Cursor).
The other way to get out of the loop is to to suggest different approaches to the problem. Sometimes that means going back a few steps and directing it in a different way. Sometimes it means reading up online on other options that people use (I suppose you could also try to ask the LLM for other approaches, I think sometimes that has worked for me). Occasionally I will bug my coder friends, and the nice thing is that usually you just need the name of a different approach from them and the LLM will take it from there.
So overall the experience of coding my first QGIS plugin was incredibly pleasant. It didn’t all happen ‘automatically’, we’re not (yet) there with these tools, and I suspect they’re most always going to require some guidance and iteration. But the challenges were all surmountable. I think the biggest thing these tools do is really make the learning curve a lot less steep. You’re still going to have to dig in and learn quite a bit, but you get much more immediate positive feedback. More recently I started another plugin and I will admit that it was more of a challenge, and I got seriously stuck for more than two hours, so I suspect there’s still some luck involved to have it be easy.
In the time between writing this and actually posting it I have hit bigger bugs than I have before — one took almost 4 hours to resolve, and the other was almost two hours. It reminded me of the frustrating parts of coding, and is proof that it’s all not magical. But it is so satisfying when you get past one of these bugs. I hope everyone finds early success without hitting frustrating bugs, but at some point you will hit frustrations, and I encourage you to be hard headed and just keep on trying until you get it working.
So I do want to encourage everyone who has read this far to at least try. Get a trial of Cursor or Windsurf and set some time aside. Be sure to start with a problem you actually want to solve, something that will make your life easier. And just keep trying even if at first you don’t succeed, it is currentlye easier than it’s ever been, and that will only improve. It doesn’t have to be QGIS, but it’s a nice platform to build upon. I’m also going to try to record some videos of building a plugin, to help demystify things even more.
If you do build something that’s useful to you then please share the code on GitHub and publish to the QGIS plugin repository. Chances are it might be useful to someone else. I know it can be scary, I still remember the first time my boss told me I needed to push my code to be open source. But honestly no one is looking at your code and judging it — they’re just psyched to have something potentially useful, and psyched that you contributed something positive to the world.
Good luck! And if you want to dive in but don’t have a project that immediatly jumps to mind I do welcome all AI-assisted contributions to my QGIS plugin. I tagged a number of ‘good first issues’ and can easily add more, and am more than happy to offer advice & help to anyone looking to contribute.
Also, if you’re not signed up already do come to the Cloud Native Geo conference in Utah. It’s going to be awesome.
– In English
O capítulo YouthMappers UFRJ é coordenado pela Profa. Dra. Raquel Dezidério Souto (em estágio pós-doutoral) e pelo Prof. Dr. Manoel Fernandes, ambos do Laboratório de Cartografia, da Universidade Federal do Rio de Janeiro (GeoCart-UFRJ), tendo sido constituído para desenvolver pesquisas em mapeamento colaborativo com OpenStreetMap e programas relacionados, envolvendo a comunidade, com integrantes internos e externos à UFRJ, residentes no Brasil e no exterior.
A iniciativa faz parte da rede internacional YouthMappers, projeto de longo prazo e abrangência global, fundado por professoras das Universidades do Arizonas e do Texas, com o apoio do USAID (EUA). Atualmente, são mais de 400 grupos, em universidades públicas de todo o mundo.
A temática escolhida para início do desenvolvimento das atividades de mapeamento no estado do Rio de Janeiro (Brasil) é a redução dos riscos e desastres (RRD), com foco nas áreas de risco do estado. Outras temáticas, como o mapeamento de árvores, a avaliação das condições de habitações em áreas de especial interesse social ou o desenvolvimento de aplicações na Web, estão sendo contempladas por nós e nossos parceiros, ainda em estágio de desenvolvimento.
Nessa via, o capítulo YouthMappers UFRJ, em overview, visa desenvolver pesquisas, utilizando dados colaborativos e dados oficiais secundários, como meio de fornecer subsídio informacional para o estado do Rio de Janeiro (Brasil), contribuindo para a tomada de decisão e outras finalidades. E contribuir para o desenvolvimento de outros capítulos YouthMappers no estado do Rio de Janeiro e em outros estados do Brasil.
O capítulo YouthMappers UFRJ faz parte do HUB YouthMappers Rio de Janeiro, consórcio dos capítulos da rede internacional, presentes no estado: UFRJ - UFRRJ - UERJ - UFF Niterói - UFF Campos (situação em: 31-01-2024), coordenado tecnicamente pelo IVIDES.orgⓇ.
Mapa das áreas de interesse (AOI) do projeto de mapeamento da infraestrutura para redução de riscos e desastres em Maricá, RJ Brasil. Dados © contribuidores(as) do OpenStreetMap. Áreas delimitadas pela Profa. Dra. Alessandra de Freitas (POLI-UFRJ).
Nosso primeiro projeto é voltado ao mapeamento da infraestrutura para redução de riscos e desastres (RRD) no município de Maricá (Rio de Janeiro, Brasil). O município vem sofrendo com episódios de enchentes, movimentos de massa e erosão costeira, devido à ocupação irregular, o avanço em áreas pouco adequadas para localização das moradias, além do aumento da frequência e da intensidade dos eventos desastrosos na região.
No projeto, são utilizadas estratégias colaborativas, como o mapeamento com OpenStreetMap e a criação de mapas Web interativos com uMap, um projeto francês que mantém uma plataforma aberta, on-line, para o desenvolvimento de mapas Web interativos, com hospedagem em nuvem. Sem cobrança de anuidade, sem pedido para incluir add-ons, sem rastreamento do seu IP… utilizando o uMap!
Captura de tela do MapRoulette. Dados © contribuidores(as) do OpenStreetMap.
Para apoiar tecnicamente o capítulo YouthMappers UFRJ, o Instituto Virtual para o Desenvolvimento Sustentável - IVIDES.orgⓇ está incumbido da elaboração de material pedagógico em mapeamento com OpenStreetMap e a realização de treinamentos, além da criação de projetos de mapeamento colaborativo, que permitem o controle das atividades, de modo que não haja mais de um(a) mapeador(a) na mesma área, a fim de preservar os dados, que não são corrompidos. Adicionalmente, os dados são validados, no mesmo projeto disponibilizado, podendo ser configurados níveis de acesso, segundo a experiência dos(as) mapeadores(as): iniciante, intermediário ou avançado, o que evita erros comumente cometidos por iniciantes.
Captura do mapa do osm.org, com o Brasil no centro. Dados © contribuidores(as) do OpenStreetMap.
Um curso completo de Capacitação em mapeamento com OpenStreetMap foi realizado em agosto de 2023, sendo publicado on-line (para aumentar o acesso ao mesmo) e tornado curso de fluxo contínuo (entrada em qualquer época do ano), tendo alcançado 384 inscritos, até 31/01/2025, residentes no Brasil e em países do exterior. O curso permanece on-line, como plataforma de self learning, com a emissão de certificados, a cada seis meses, mediante avaliação satisfatória dos resultados enviados pelos participantes.
Captura da tela do osm.org, mostrando a cobertura vegetal, mapeada no ambiente on-line. Dados © contribuidores(as) do OpenStreetMap.
Em 2024, foram promovidas oficinas de mapeamento temático com OpenStreetMap, on-line, cobrindo o mapeamento em diferentes temas, como pontos de interesse (POI), hidrografia, cobertura vegetal, feições relacionadas à RRD, além da preparação e importação de dados oficiais, em pequenos lotes, para o OSM.
Somados o número de participantes das oficinas temáticas ao público das oficinas avulsas - aquelas realizadas com grupos locais, em diversas universidades, públicas e privadas, no Brasil e no exterior (México e Moçambique), cerca de 700 certificados foram emitidos pelo IVIDES.orgⓇ!
Para saber como agendar uma oficina avulsa na sua organização, entre em contato com [email protected].
Primeira tela da breve comunicação, realizada no SotM 2024, Nairobi (Quênia), pela Dra. Raquel Dezidério Souto.
O projeto de mapa interativo desenvolvido com OSM, uMap e WordPress, para o mapeamento da infraestrutura para RRD no município de Maricá, RJ (Brasil), foi apresentado no State of the Map 2024, a principal conferência global da comunidade OpenStreetMap, realizada em Nairobi (Quênia), nessa edição.
© IVIDES.org
Em 2024, o Instituto IVIDES.orgⓇ conduziu o mapeamento colaborativo da região da bacia hidrográfica Taquari-Antas, Rio Grande do Sul (Brasil), como parte dos esforços brasileiros para produzir dados que apoiem nas fases de enfrentamento do desastre e de pós-desastre.
Para divulgar os resultados e promover o intercâmbio de informações entre os cientistas (todos com nível de doutorado), em setembro de 2024, o Instituto IVIDES.orgⓇ realizou o Seminário Científico pelo Rio Grande do Sul, em parceria com os capítulos YouthMappers UFRJ, UERGS e Unipampa; além do apoio do Wiki Movimento Brasil (WMB), na realização das transmissões ao vivo e no patrocínio dos brindes sorteados.
Na página do evento, encontrará os resultados do Seminário, em arquivos PDF e em vídeo, das palestras com professores de três universidades públicas do estado - UERGS, UFRGS e FURG. Ou diretamente na Wikimedia. E também, alguns mapas Web, que foram gerados rapidamente com uMap, a fim de apoiar as ações humanitárias e de governo, durante o desastre.
© OpenStreetMap
O Instituto IVIDES.orgⓇ promoveu um evento para comemorar os 20 Anos de OpenStreetMap, em parceria com os capítulos do HUB YouthMappers Rio de Janeiro e apoio para a distribuição de brindes, da Tom Tom. Como parte da programação, foram realizadas cinco oficinas de mapeamento colaborativo com OpenStreetMap, com diversas temáticas, com a Unipar (Paraná), Unipampa (Rio Grande do Sul), COPPE-UFRJ (Rio de Janeiro) e UFF Campos (inauguração do capítulo YouthMappers da UFF, em Campos dos Goytacazes).
Para conhecer mais sobre o nosso trabalho, visite o portal do projeto YouthMappers UFRJ.
– Com informações da coordenação do YouthMappers UFRJ, Rio de Janeiro, 01 de fevereiro de 2025. IVIDES.orgⓇ is a registered trademark. [email protected] | https://ivides.org/
– Em Português
The YouthMappers UFRJ is coordinated by Prof. Dr. Raquel Dezidério Souto (in post doctoral internship), and Prof. Dr. Manoel Fernandes, both from the Laboratory of Cartography, of the Federal University of Rio de Janeiro (GeoCart-UFRJ, Brazil), and was set up to prepare new mappers and develop collaborative mapping research with OpenStreetMap and related programs, involving the community, with members from inside and outside UFRJ, residing in Brazil and in other countries.
The initiative is part of the international YouthMappers network, a project founded by professors from the Universities of Arizona and Texas, with sponsor and support of the USAID (USA). There are currently more than 400 groups in public universities around the world.
The theme chosen to begin the development of mapping activities in the state of Rio de Janeiro (Brazil) is Disaster disaster reduction (DRR), with the risk areas in the state of Rio de Janeiro as areas of interest. Other themes, such as mapping trees, evaluation of housing conditions in areas of special social interest or developing web applications, are being considered by us and our partners, who are still at the development stage.
In this way, the YouthMappers UFRJ chapter aims to develop research using collaborative data and secondary official data, as a means of providing informational support for the state of Rio de Janeiro (Brazil), contributing to decision-making and other purposes. And to contribute to the development of other YouthMappers chapters in the state of Rio de Janeiro and other states in Brazil.
Map of the areas of interest (AOI) of the infrastructure mapping project for disaster risk reduction in Maricá, RJ Brazil. Data © OpenStreetMap contributors. Areas delimited by Prof. Dr. Alessandra de Freitas (POLI-UFRJ).
Our initial collaborative project is focused on mapping infrastructure for disaster risk reduction (DRR) in the municipality of Maricá (Rio de Janeiro, Brazil). The municipality has been suffering from episodes of flooding, landslides and coastal erosion, due to irregular occupation, spreading to areas that are unsuitable for housing, as well as an increase in the frequency and intensity of disastrous events in the region.
The project adopts collaborative techniques such as mapping with OpenStreetMap and interactive mapping with uMap, a French project that provides a platform for developing interactive web maps, free of charge and hosted in the cloud. It also makes its source code available. No annual fees, no requests for add-ons, no IP tracking… using uMap!
MapRoulette screenshot. Data © OpenStreetMap contributors.
To provide technical support to the YouthMappers UFRJ chapter, the Virtual Institute for Sustainable Development - IVIDES.orgⓇ is preparing educational material on mapping with OpenStreetMap, carrying out training sessions, creating collaborative mapping projects, which allows the control of mapping activities, e.g., impeding more than one mapper editing the same area. This arrangement makes it possible to organize the tasks, and therefore the mapping project as a whole, and to preserve the data, which is not corrupted. In addition, the data can be validated, from the same project made available, and with the configuration of access levels, according with the mapper experience - beginner, intermediate, advanced mappers, avoiding mistakes commonly made by beginners.
Captura do mapa do osm.org com Brasil ao centro. Dados © contribuidores(as) do OpenStreetMap.
A complete Mapping with OpenStreetMap training course was published in August 2023, having become a continuous flow course (entry at any time of the year), and having reached 384 participants (data 01/31/2025), residing in Brazil and other countries. The course remains online, as a self-learning platform, with certificates being issued every six months, based on an evaluation of the results of the participants.
osm.org screenshot, showing landcover features. Dados © contribuidores(as) do OpenStreetMap.
In 2024, thematic mapping workshops with OpenStreetMap were held online, covering the mapping of points of interest (POI), hydrography, landcover, features related to DRR, as well as importing official data into OSM.
Added to the participants of the thematic mapping workshops and the public of the individual workshops held with local groups at various public and private universities in Brazil and in other countries (Mexico and Mozambique), and nearly 700 certificates were issued by IVIDES.orgⓇ!
To find out how to schedule a one-day workshop at your organization, please,contact [email protected].
First slide of the lightning talk, realized in the SotM 2024, Nairobi (Kenya), by Dra. Raquel Dezidério Souto.
The interactive map project, developed with OSM, uMap and WordPress, for mapping the infrastructure for DRR in the municipality of Maricá, RJ, Brazil, was presented during the State of the Map 2024, the main global conference of the OpenStreetMap community, held in Nairobi, Kenya, in this edition.
© IVIDES.org
In 2024, the IVIDES.orgⓇ conducted a collaborative mapping campaign for the Taquari-Antas River Basin Region (RS), as part of Brazil’s efforts to produce data to support disaster response.
In order to disseminate scientific results, in September 2024, the Institute promoted the Scientific Seminar for Rio Grande do Sul, in partnership with the chapters YouthMappers UFRJ, UERGS and Unipampa; and technical and gift support from Wiki Movimento Brasil (WMB).
On the event page, you’ll find the PDF and video files of the lectures with professors from three public universities in the State of Rio Grande do Sul - UERGS, UFRGS and FURG. Or directly on Wikimedia. Also, some Web maps were generated very quickly with uMap, in order to support humanitarian and government actions during the disaster and the post disaster activities.
© OpenStreetMap
The IVIDES.orgⓇ held an event to celebrate 20 years of OpenStreetMap, in partnership with the HUB YouthMappers Rio de Janeiro and with the support of Tom Tom, for the distribution of gifts. As part of the Agenda, five workshops on collaborative mapping with OpenStreetMap were promoted, with different thematics, with partnership of the universities: Unipar (Paraná), Unipampa (Rio Grande do Sul), COPPE-UFRJ (Rio de Janeiro) and UFF Campos (inauguration of the UFF chapter in Campos dos Goytacazes, RJ).
To find out more about our work, visit the YouthMappers UFRJ project portal.
– With information from the coordination of YouthMappers UFRJ, Rio de Janeiro, February 1st, 2025.
Translated with Deepl (free version) and with human validation.
IVIDES.orgⓇ is a registered trademark.
[email protected] | https://ivides.org/
Using OpenStreetMap (OSM) and JOSM (Java OpenStreetMap Editor) has completely transformed my perspective on places in Ghana. What used to be just names on a map are now vibrant locations I explore, analyze, and contribute to in meaningful ways.
With each mapping session, I am a digital explorer, uncovering hidden details about my country. While tracing highways, POIs, buildings, and rivers, to ensure that every corner of Ghana is well represented, I explore!. From the bustling streets of Accra to the serene landscapes of the Volta Region, my virtual travels take me everywhere without even leaving my seat.
As I explore and see places, I contribute data to solve real-world problems. Through OSM, I have contributed to flood risk assessments, improved accessibility to schools, and even helped emergency responders find critical locations. It’s amazing to know that my little edits can make a big difference in someone’s life.
Mapping is no longer just a hobby; it’s a passion, a responsibility, and a way to make a mark on the world.
One edit at a time! The journey continues!
Was an meinem Profil auffallen könnte, ist die Tatsache, dass ich schon mal gesperrt war. Der elektronische Pranger von OSM vergisst das nicht. Ob meine Strafen berechtigt waren oder nicht, darum soll es hier nicht gehen. Sondern darum, dass OSM zwar vieles gut macht, eine Sache aber wirklich schlecht: die Sperren. Die sprechen allen ansonsten hochgehaltenen Prinzipien wie Fairness, Transparenz, Teilhabe Hohn.
Sie werden frei nach Gusto von einzelnen Mitgliedern der sogenannten Data Working Group wie von einem deus ex machina ohne Anhörung der Beschuldigten, ohne Einspruchs- oder Verteidigungsmöglichkeit und ohne begründende Antwort auf eine Nachfrage verhängt.
Damit ist potentieller Willkür ein scheunengroßes Einfallstor geöffnet. Selbstherrlich wird zudem ein exponentielles Wachstum künftiger Strafen angedroht. Das hieße im realen Leben etwa, dass ein Dieb mit ein paar Vorstrafen bei der nächsten Tat, auch wenn er nur eine Flasche Schnaps hat mitgehen lassen, für Jahre in Haft käme. Justizterror als Modell der Abschreckung, wie man es aus autoritär regierten Staaten kennt. Als (untauglicher) Versuch der Herstellung einer Scheinlegitimität darf mithin die Verwendung des Personalpronomens im Plural (“wir”) durch einen Verantwortlichen in seinem “Urteil” gelten.
Natürlich ist das Vorgehen bei OSM in seinen Konsequenzen unvergleichlich mit demjenigen in den erwähnten Tyranneien. Aber das zugrunde liegende Denkmodell ist eben sehr ähnlich. Witzigerweise in meinem Fall angewandt von jemandem, der nach seinen Blog-Äußerungen ein Verfechter woker Kultur zu sein scheint.
Es ist und bleibt jedenfalls die undurchschaubare Festsetzung von Sperren und ihrer Länge ein dicker Schandfleck für OSM.
Another one got ignored today, another road left unmapped, another place erased because it wasn’t profitable enough to exist on a corporate map. No one noticed, because no one was supposed to.
They don’t talk about the missing footpaths, the streets that don’t appear because they aren’t in a government database, the communities left invisible because they don’t generate ad revenue. They don’t talk about how your map—your view of the world—is decided not by truth, but by business interests.
They call us idealists, hobbyists, dreamers. They say the world has already been mapped. But they are wrong.
We are the ones who see the gaps. We are the ones who refuse to let our neighborhoods, our histories, our stories be erased. We are the ones who put the world on the map—not for profit, but for people.
Yes, I am a mapper. My crime is that of curiosity. My crime is refusing to accept a world where only what is profitable is visible. My crime is knowing that no company, no government, no algorithm should have the power to decide what exists.
You may ignore us. You may try to replace us with AI, to wall off geography behind paywalls, to tell people that their contributions don’t matter. But you can’t stop us all. Because the world is ours to map.
After all, we are all alike. We are OpenStreetMap.
🤣 with apologies ++The Mentor++ (8 January 1986). “The Conscience of a Hacker”. Phrack, Inc. 1 (7): 3 of 10 - wikipedia
Explore viewers for visualizing GIS vector data in shapefile format, available for both desktop and web browsers.
The post Shapefile Viewers appeared first on Geography Realm.
I have been watching the codification of spatial data types into GeoParquet and now GeoIceberg with some interest, since the work is near and dear to my heart.
Writing a disk serialization for PostGIS is basically an act of format standardization – albeit a standard with only one consumer – and many of the same issues that the Parquet and Iceberg implementations are thinking about are ones I dealt with too.
Here is an easy one: if you are going to use well-known binary for your serialiation (as GeoPackage, and GeoParquet do) you have to wrestle with the fact that the ISO/OGC standard for WKB does not describe a standard way to represent empty geometries.
Empty geometries come up frequently in the OGC/ISO standards, and they are simple to generate in real operations – just subtract a big thing from a small thing.
SELECT ST_AsText(ST_Difference(
'POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))',
'POLYGON((-1 -1, 3 -1, 3 3, -1 3, -1 -1))'
))
If you have a data set and are running operations on it, eventually you will generate some empties.
Which means your software needs to know how to store and transmit them.
Which means you need to know how to encode them in WKB.
And the standard is no help.
But I am!
All WKB geometries start with 1-byte “byte order flag” followed by a 4-byte “geometry type”.
enum wkbByteOrder {
wkbXDR = 0, // Big Endian
wkbNDR = 1 // Little Endian
};
The byte order flag signals which “byte order” all the other numbers will be encoded with. Most modern hardware uses “least significant byte first” (aka “little endian”) ordering, so usually the value will be “1”, but readers must expect to occasionally get “big endian” encoded data.
enum wkbGeometryType {
wkbPoint = 1,
wkbLineString = 2,
wkbPolygon = 3,
wkbMultiPoint = 4,
wkbMultiLineString = 5,
wkbMultiPolygon = 6,
wkbGeometryCollection = 7
};
The type number is an integer from 1 to 7, in the indicated byte order.
Collections are easy! GeometryCollection, MultiPolygon, MultiLineString and MultiPoint all have a WKB structure like this:
wkbCollection {
byte byteOrder;
uint32 wkbType;
uint32 numWkbSubGeometries;
WKBGeometry wkbSubGeometries[numWkbSubGeometries];
}
The way to signal an empty collection is to set its numGeometries value to zero.
So for example, a MULTIPOLYGON EMPTY
would look like this (all examples in little endian, spaces added between elements for legibility, using hex encoding).
01 06000000 00000000
The elements are:
The Polygon and LineString types are also very easy, because after their type number they both have a count of sub-objects (rings in the case of Polygon, points in the case of LineString) which can be set to zero to indicate an empty geometry.
For a LineString:
01 02000000 00000000
For a Polygon:
01 03000000 00000000
It is possible to create a Polygon made up of a non-zero number of empty linear rings. Is this construction empty? Probably. Should you make one of them? Probably not, since POLYGON EMPTY
describes the case much more simply.
Saving the best for last!
One of the strange blind spots of the ISO/OGC standards is the WKB Point. There is an standard text representation for an empty point, POINT EMPTY
. But there nowhere in the standard a description of a WKB empty point, and the WKB structure of a point doesn’t really leave any place to hide one.
WKBPoint {
byte byteOrder;
uint32 wkbType; // 1
double x;
double y;
};
After the standard byte order flag and type number, the serialization goes directly into the coordinates. There’s no place to put in a zero.
In PostGIS we established our own add-on to the WKB standard, so we could successfully round-trip a POINT EMPTY
through WKB – empty points are to be represented as a point with all coordinates set to the IEEE NaN value.
Here is a little-endian empty point.
01 01000000 000000000000F87F 000000000000F87F
And a big-endian one.
00 00000001 7FF8000000000000 7FF8000000000000
Most open source implementations of WKB have converged on this standardization of POINT EMPTY
. The most common alternate behaviour is to convert POINT EMPTY
object, which are not representable, into MULTIPOINT EMPTY
objects, which are. This might be confusing (an empty point would round-trip back to something with a completely different type number).
In general, empty geometries create a lot of “angels dancing on the head of a pin” cases for functions that otherwise have very deterministic results.
Over time the PostGIS project collated our intuitions and implementations in this wiki page of empty geometry handling rules.
The trouble with empty handling is that there are simultaneously a million different combinations of possibilities, and extremely low numbers of people actually exercising that code line. So it’s a massive time suck. We have basically been handling them on an “as needed” basis, as people open tickets on them.
POINT EMPTY
to MULTIPOINT EMPTY
when generating WKB.
SELECT Geometry::STGeomFromText('POINT EMPTY',4326).STAsBinary()
0x010400000000000000
POINT EMPTY
WKB.
SELECT ST_AsBinary(ST_GeomFromText('POINT EMPTY'))
NULL
I’m back home! All the uploads done! Yeah!
As I had written about earlier, I was on tour with actually, to be precise, one of the bands I’m in under the title/ program “The Dubliners Experience” in the Netherlands from Jan 15th to Feb 1st. The GoPro Max was our constant companion on the roof of the tour bus. It covers mostly motorways and the areas around concert venues, of course. I also walked around the campsite we stayed at (band life isn’t as glamorous as they make it out to be in the movies after all) with it, but that imagery is not super useful, I’m afraid. How much can you map in a fen, when there are not even leaves on the trees to map species…But still, the area got covered.
I was especially keen to upload to Panoramax, because the coverage was quite poor which does not mean to discredit the people who have already contributed, of course!
I’ll give you some before and after screenshots, some of which I had already shared on Mastodon yesterday.
Before I started, this was the situation in the Netherlands for both flat and 360° images.
And this was yesterday:
This was the state of 360° coverage on Jan 14th 2025:
And this yesterday:
EDIT: I forgot to mention that I tracked where we were going and what I had captured using OSMAnd, so I could prevent duplicate sequences. There are still some duplicates, because of the foggy conditions during the first week and because I didn’t want to risk the camera turning off while inactive on a long drive.
As I had mentioned in the previous diary post, I “lost” (i.e. deleted) sequences where the camera had tilted, because I did not know how to correct that. Stupid me. If anyone knows how to or knows of a tutorial, please comment. I don’t even know what to search for.
While on the way to concert venues, I added a couple of notes along the way to be resolved after the show. Some were resolved by other people in an amazingly short time. So, dankewell you to those folks!
I have an interest in thatched buildings, so that was something I mapped being driven and walking using roof:material=thatch
. I added a couple more last night, bringing the total to 101. While mapping that, I noticed that many of the buildings are not squared off. Is there any good reason for that? It made adding building:part=yes
for roof:material=thatch
a bit awkward. Of course, I also added a couple of missing buildings (mostly sheds or maybe car ports, hard to tell) which had not been covered in the last import by the Dutch mapping community.
Around the theatres, quite a few defibrillators were missing, even though the mapping density of them in general seems to be good in the Netherlands. Many a time, I got OSMAnd out to map one only to find that it was already mapped. But maybe mappers don’t go to the theatre much, so it’s good I came along. I added 25 in total, some also just spotted from the band bus.
I continued on with the backstage
prefix to map amenities (showers, washing machine etc) for artists found backstage. I don’t expect this to take off in general, but we map amenities for babies, wheelchair users , why not also this. I know it’s not easily verifiable, but there is no harm in doing it imho. And to be fair, I added nappy changing tables wherever I saw them. I made a point using every bathroom I could to survey that. ;-)
Loading ramps and artists’ entrances (name=artiesteningang
) into the venues were also added to make it easier in the future to know where to go, because there are a lot of one-way systems and pedestrianized areas around the venues.
Obviously, I added missing shops, pubs, street lights, street cabinets, details on restaurants and whatever else I fancied.
I must say, I really appreciate always having a mission, so the waiting around doesn’t get too draining and you “force” yourself to explore your surroundings, if time permits. I wish I had had more time during daylight, because we mostly played in small towns which probably don’t get too much attention from mappers on a regular basis, but we usually arrived for soundcheck at 16:00 with the sun setting around 17:00, so it was quite limiting.
And I made it to #56 in the statistics for the Netherlands on neis-one!
Hi OSM folks,
Yesterday I finished mapping West Virginia’s forest landcover for OSM in 5 years! It was a big project and definitely the next one will be a bigger one but because of that I’m gonna do it in a pace that I can maintain. If I have to stop, then I stop. Well, it’s just a hobby for me. As I announced on Mapping USA, I’m mapping Pennsylvania from now on. I’m really interested in the history of that state, The Keystone State.
And yes, I just wanted to try myself out how I could map forest landcover outside of Europe. It seems everyone seems hyped and I like this!
I’m not a robot so I can’t work on it 24/7 due to my personal life and I know i’m not making a 100% accurate landcover, but hey, I learnt some tricks which I’m taking advantage of! Sometimes I also criticise the quality I do but well people usually improve as time flies.
I’d like to thank everyone in the OSM community for giving me help and guidance, and I’m sure I’ll still have questions if it comes to specific areas. :)
23/01/2025-29/01/2025
railmap.gl [1] |, a Web map that shows railway operators for the North America. © MapLibre, © Three.js | map data © OpenStreetMap Contributors.
The following two proposals are up for voting until Friday 14 February:
pratictioners=*
to tag the number and field of professionals, such as doctors, lawyers, or therapists, available at facilities such as clinics, legal offices, or wellness centres, aiming to enhance the mapping of professional services.sensory_friendly=*
to indicate if a feature provides sensory accessibility for people with sensory processing sensitivity and if there are designated sensory friendly hours.Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.
This weeklyOSM was produced by MarcoR, MatthiasMatthias, Raquel Dezidério Souto, Strubbl, TheSwavu, barefootstache, derFred, mcliquid.
We welcome link suggestions for the next issue via this form and look forward to your contributions.
I started surveying the America's Cup Walk plaques along Mews Road. There are fewer than I'd thought (ten so far), although I've not yet found a source that says how many there were originally (or when they were installed; I'm assuming it wasn't actually 1987, but maybe just a few years later?).
[todo – photos]
Jak už jsem napsal dříve, tak k mapování mého města jsem se dostal z konkrétních důvodů.
Online mapa OSM byla tenkrát nedostatečně zakreslená a mnoho informací bylo zastaralých nebo chybějících. Trápilo mne to. Druhý důvod byl, že jsem se opakovaně setkával s návštěvníky a turisty, kteří něco hledali a na jejich mapě to nemohli najít. Neuměl jsem jim to rychle ukázat. Bylo to pořád dokola.
Jiné mapy tu existovaly, ale kromě komerční varianty tištěných map, nebo mapových cedulí místního mapového vydavatele, nenabízely turistům dostatečně přesné informace. A to jsem chtěl zlepšit.
Teď po několika letech by bylo fajn napsat, co se podle mé zkušenosti u uživatelů map změnilo a co ne. Nakonec pár slov, co by se dalo vylepšit.
Předně musím říci, že povědomí konkrétně o mapě OSM, je u bloudících turistů velmi slabé, protože znají a používají většinou jiné online mapy. Pokud je vůbec používají. Mají k tomu své důvody. Málokdy dokáží použít offline mapového klienta. Takže pro mapy musí mít aktivní datový tarif, což je pro cizince v ČR trochu drahá záležitost. A tak první co hledají, je free wifi.
turisté rádi používají Google maps, protože jsou trvale přihlášeni k účtu přes svůj Android mobil. Proto mají nejblíž k Google Maps a vůbec je nenapadá, že by mohli používat jinou mapu. Fajn. Když jsou zvyklí dívat se na slepou mapu, která ukazuje svévolně jen to, co se jí chce ukázat od platících zákazníků, tak budiž jim přáno. Ti, co neplatí, tak na mapě vidět nebudou. To je byznysmodel v kostce.
anebo jsou z dalekých asijských zemí, kde by použili jejich mapy s jejich názvy míst, ale to se v Evropě moc nedá, tak druhou volbu zvolí Google Maps. Google před lety udělal silný marketing pro své produkty a těží z toho dodnes. Přestože se mnohé změnilo.
Znají a používají mapy Apple, což je podobný případ. Apple je nabízí v základním vybavení telefonů iPhone, včetně integrace do dalších služeb telefonu s iOS. Zajímavou drobností je, že Apple začal používat základní mapová data od OSM, protože jsou globálně nejdetailnější. Apple na tuto vrstvu pouze umísťuje své POI.
uživatelé automobilových navigací, tj, řidiči. Ve svých vozidlech mají integrované palubní počítače, které nabízejí buď vlastní navigaci (možná nějakou derivaci OSM přes podávané navigátory, nebo jiné zdroje) Nicméně auto-navigátor vyžaduje občas aktualizaci a tu řidiči někdy nedělají. Bez aktualizace základní mapy pak mnohdy nemá mapa přesné ulice, protože v ní prostě vůbec neexistují.
Uživatelé Waze jsou tací zvláštní uživatelé, kteří se začali stoprocentně spoléhat pouze na Waze a dávají přednost především jejím dopravním funkcím. V těch skutečně může Waze navigátor dobře pomoci, ale řidiči si pak neuvědomí, že navigátor nedokáže zprostředkovat mapové informace, které s dopravní navigací nesouvisí. Jenže na tom právě uživatelé Waze ztroskotají a vůbec neví proč. Waze jim to totiž neřekne.
Znají a používají Mapy.cz (jde především o Čechy, kteří Mapy.cz považují za jediné a nejlepší mapy) V Čechách Mapy.cz nabízí opravdu nejobsáhlejší mapová data, která pečlivě shromažďují, přibírají je i od jiných zdrojů (například zdroje studánek) a přidává ke geografii svůj obchodní model s komerčními objekty provázanými na Firmy.cz. Mezi námi, samozřejmě že Mapy.cz bývají někde i neaktuální. Nedokáží včas zachytit různé změny a stěhování firem, nebo dopravní situaci.
Naprostá většina uživatelů netuší, že za hranicemi ČR používají Mapy.cz mapová data od OSM, která jen vykreslují ve svém mapovém stylu. Uživatelé si tedy myslí, že Mapy.cz umí v Austrálii zobrazit každou pěšinku v trávě. Ano, umí to právě díky OSM.
Právě z výše zmíněných důvodů se lidé rychle dostanou do situace, že něco někde nemohou najít, nebo realita není úplně taková, jak si jí ve své mapě našli.
proč nelze pár kroků přejít z této ulice do druhé ulice? No protože tudy žádná cesta nevede, je tam sráz, zeď a soukromá zahrada. Ale na mapě která takové věci ani nechce mít, to nikdy neobjeví.
proč na ulici není restaurace XYZ, kterou si našel v mapě? Protože už se dávno odstěhovala a vložený bod na mapě nezanesl majitel, ani žádný místní mapovač, ale označili ho nějací náhodní lidé, kteří už nikdy tento bod zájmu opravovat nebudou.
proč má podnik zavřeno? Protože nemusí mít otevřeno. Ale spíše je na místě se ptát, proč nemá bod na mapě viditelnou otevírací dobu, nebo odkaz na web firmy, která by poskytla dobrou odpověď.
jakto že je tato ulice neprůjezdná, nebo zavřená? To se stává většinou, když jde o místní dopravní omezení, které je přechodné v řádu měsíců. Přesto na ně žádná mapa typu Waze, Google, nebo Apple nedokáže včas zareagovat a data o průjezdnosti doplnit. Ale řidiči na svou navigaci spoléhají a pak jsou překvapeni.
kde jsou místní specifické body zájmu, které se na žádné mapě ani zaznamenat nedají. Například pěší průvodcovské trasy vedoucí cestou-necestou, propojující unikátní místní zajímavosti. To vědí právě místní průvodci a proto byste je měli následovat i bez mapy. Znají své město a krajinu opravdu dobře.
možná citlivě zaznamenat do mapy i prvky, které místo dělají specifickým a zajímavým. Mohou to být domovní znamení, drobné památky, nebo nekomerční komunitní centra, přírodovědné zajímavosti v otevřené krajině.
Přes prakticky nepozorovatelné používání aktuálních map OSM zahraniční i tuzemskou turistickou veřejností, si myslím, že u některých jiných uživatelů mají dobrý smysl. Vlastně jsem přesvědčen, že výborně slouží u lidem, kteří se na cestu neptají a svůj cíl cesty, nebo turistickou zajímavost naleznou sami, i díky dobré mapě.
Informační zkušenost s OSM se dá vylepšit zpřesňováním dat. Hodnotný výstup pak dostanou prakticky skrz nabídku různých uživatelských aplikací, i mobilních klientů, které data od OSM přebírají. Mohou to být navigátory pro automobily, cyklisty, běžce, i pěší turisty. Nikdo z uživatelů vlastně neřeší, odkud se mapová data berou, chtějí je pouze hned a přesná.
Sado Island is an example of a landmass with tripartite physical geography.
The post Sado Island: An Example of Tripartite Geography appeared first on Geography Realm.
Kira-kira tag name:ms-Arab apakah harus sama dengan standar Bahasa Melayu Malaysia? masih bingung mau nambahin name:ms di tulisan-tulisan Arab Melayu di Pekanbaru, kaya danau kan di Bahasa Melayu Malaysia harusnya tasik ya, tapi overall lantak aja tulis Arab Melayu sesuai nama dalam B. Indonesia di Riau, kecuali memang ada nama lokalnya yang beda
Tapal Batas Desa antara Desa Mantangai Hilir dan Desa Mantangai Tengah adalah Jembatan Saka Diwung.
这次回美国, 我每处专门都用OSM比较, 发现即使在OSM的诞生地, 以及大量互联网使用者, GIS从业者的美国, 非盈利性质的OSM其细化也远远比不上商业化的谷歌地图. 之前我觉得OSM详细, 是因为目光都聚集在了主要公共设施, 比如广场, 公园, 车站, 知名景点. 这些地方OSM的编辑者自然多, 但是下至中小型乡镇, 其覆盖率是真不高. 不过话说回来我在之前看OSM的热点上, 美国也不是极高值. 绘制最多的那块在德国, 瑞士, 意大利, 法国, 波兰等中西欧国家, 不知道那里等实际情况如何.
Last month I released my first QGIS plug-in, and promised I’d write an in-depth post about it. I’ll give an overview and dig into some of the motivations, and then I’ll put the details of my experience of coding with AI in its own follow up post.
I’ve been a long time QGIS user, though am very far from an expert — I mostly open different files and visualize them. I’ve never been able to afford an Esri license, so it’s QGIS all the way for me. And I’ve always loved the plugin ecosystem: the fact that many people worldwide are adding all kinds of functionality so that anyone can customize it to their needs is just awesome, and a testament to the power of open source. There’s still things Esri can do better, but we’re now at the point where there’s a lot of things QGIS can do better.
I also recently have ‘become a coder’ again, thanks to the power of AI tools. I’ll dive into more of the experience in my next post, but it meant that I could tackle something like a new QGIS plugin as a (long) weekend project. I started it just to see if I could, and things kept working, so I kept pushing on.
One of my latest missions is to advance GeoParquet as a format to fulfill the promise of cloud-native vector data, enabling organizations to get most all the functionality of a Web Features Service like GeoServer by simply putting up their data as GeoParquet on a cloud bucket. I was so excited when Overture Maps embraced the format, but they also got a good bit of pushback for not having a ‘download’ button and using traditional data formats.
I was confident that if things evolved right it shouldn’t be hard to give traditional GIS users an even better experience of getting the data, since you can easily stream just what you need and transform it on the fly. A big shout out to Jake Wasserman and Overture for really stepping in to help push forward the evolution, proposing the key bbox covering and upgrading Overture to fully implement it.
A few months ago it became possible to use my favorite new geospatial tool, DuckDB (or a number of other tools), with any Overture data layer to select a spatial subset of the whole world and download just the area you cared about in tens of seconds and often faster.
Getting Overture data today
Overture has great docs for using DuckDB, and they also built a nice command-line tool, but you still have to be tech-oriented and inclined to use a terminal. They did also build a nice Explorer app, that lets you download small amounts of data. But if you wanted more than a few megabytes worth of data to load up in QGIS there still weren’t great options for those who do want to learn to use a terminal and CLI tools.
So I decided to see how far the LLM coding tools had come and figure out if I’d be able to write a QGIS plugin. QGIS development had always intimidated me: I think I had one class in college that did desktop UI’s and I found it hard to grok. But my first attempt got something on my screen and within twenty minutes I had a reasonable kernel of functionality. I ended up able to get the vast majority of it working as I wanted to in a few days during the week of Thanksgiving — coding on the plane and sneaking in mini-sprints between family time.
So my goal was to make it as easy as possible for any QGIS user to download Overture, and indeed to not force GeoParquet on them: with the plugin you can easily request data as a GeoPackage. And I also wanted to make it easy to download any GeoParquet data, so that the tool isn’t just for Overture data, but enables anyone distributing their data as GeoParquet to easily enable QGIS users to get their data.
This animated gif probably gives the quickest overview to understand what the plugin enables:
The idea is to make it simple to just download GeoParquet data into a local copy in QGIS. It currently just uses the bounds of the viewport, but I hope a future version can give more options to draw a geometry or use other QGIS layers (contributors welcome!).
Currently there are a few pre-set layers. All of Overture is obviously available, and it’s got a dedicated button to open its panel. And Source Cooperative is easily the other largest single collection of GeoParquet files (and if you have open data you’d like to make available on Source then you can likely host it there for free — just reach out!). I still need to add more Source Cooperative files, indeed I hope to make a complete fiboa & Fields of The World section, as we’ve got a lot of data up there.
And after the initial release I added a Hugging Face section, which for now is just the Foursquare OS Places dataset, but it seems like more will be added (I contemplated adding the various embeddings datasets but wasn’t sure of the practical use case of making it easier to download). And you can also just enter any custom URL to a GeoParquet online.
Right now you can download data as GeoParquet, DuckDB and GeoPackage. GeoPackage will always work, as all QGIS installations support it. GeoParquet should work on most more recent installations, though OS/X is less straightforward (But I am working with opengis.ch to try to make this better!). DuckDB right now won’t load in QGIS, but I’m starting to collaborate with QDuckDB plugin team and I think I should be able to render the results of a DuckDB download if their plugin is installed.
The awesome QDuckDB plugin
And that team also deserves a shout-out. Their plugin was the one I looked at the most for how to structure things, and they are working to solve a core issue that I need for the plugin to work well — install DuckDB. DuckDB is the core engine that powers the entire thing, as everything I did was just wrappers to all of its amazing functionality.
If this seems like something that’s useful to you it should be pretty easy to install the plugin. Just open the plugin manager and search for ‘GeoParquet’.
I think the installation process is now pretty good. Matt Travis, the first outside contributor to the plugin, worked to get it to automatically install. I think it works most of the time, but I’m not 100% sure — it attempts to automatically use ‘pip’ to install DuckDB, but I’d guess that’s sometimes blocked. My hope is GDAL 3.11 with ADBC support will enable a more ‘native’ DuckDB experience in QGIS, and that we’ll be able to include it as a core dependency.
ADBC GDAL/OGR docs — coming in 3.11!
It is on the list for the plugin to add support for more formats (which should be a great first issue for any potential contributors) — FlatGeobuf is the top of my list, and File Geodatabase also sounds interesting. If there’s other formats desired just add them to the issue. I’m pretty opposed to adding Shapefile since it comes with so many limitations that I think will get in the way of using Overture and other data, but if someone wants to make a PR and really needs it I’m sure I’d accept it.
I’ve got a number of ideas in the issue tracker, but I’d love to hear from others what they’d like to see. I don’t see this being a huge project, and indeed I could see one route of ‘success’ being that this type of functionality is more incorporated into the QGIS core. It’s a bit of a different workflow, that I actually think would also be interesting with traditional geospatial servers (WFS, ArcGIS Feature Service, etc). Instead of having QGIS try to stream data on each screen change just have the user manually ‘check out’ the data that they want — download it and then display / use that local version.
The top future ideas that I’m thinking about are:
I’d love more help on this project, and my hope is to make it an experiment of AI-enabled open source. Since I wrote 99% of it with AI coding tools I’m very happy to have all the contributions be similarly made, so if you’ve been wondering about how it all works and want a practical introduction that creates code for others to use then please take an issue!
I had thought I would also share more about my experience of using AI coding tools to create it, but since this post is already quite long I’ll break it up into its own. I’ve also got a number of insights into the state of public GeoParquet files and how we can improve the ecosystem of public data, but I’ll also save that for its own post. So stay tuned! I hope to publish both of those posts soon.
I started mapping in June 2020 as a way to find parks and trails near my home in Redmond, Washington. My daughter loves adventures and provides huge motivation. I am a former compiler engineer and bring a passion for great tooling. I love being able to work with others on an expansive, vivid, and important project.
Mapping continues to provide joy and satisfaction. I am astounded by the number of passionate, clever, and kind mappers that I have had the pleasure of collaborating with over the last two years. Most of my editing remains in the United States but who doesn’t love a little “vacation” every once in a while?
OSM-US has achieved so much in the past two years that it’s difficult to list everything. More conferences, expanding working groups, even more hiring and fundraising.
It’s been an enormous pleasure to work with my fellow board members and our Executive Director to continue the upward trend of this amazing organization. I hope to continue that work.
I have learned so much over my two-year term but I remain committed to the principles in my original position statement:
Great organizing makes contributing easier and more engaging. The working group model continues to grow and pay dividends. It remains one of the best ways to identify, engage with, and solve some of our biggest issues. The Education Working Group will soon be able to roll out sandboxed instances of OSM so that our newest mappers can safely play and learn. Our newest working group is tackling pedestrian infrastructure and has been hugely successful at bringing together various parties to help us all move collectively forward.
I believe the most successful map is the one we ALL build together. I have continued to engage productively with folks across the OSM-US Slack, Mastodon, community board, Discord, etc. Over this time I was asked to be part of the moderation team for the OSM Mastodon instance and have taken on the “helper” role in the OSM Discord. We don’t all need to be in the same spaces to make great maps but it is often helpful in my role as board member to have an expansive view of the community, even those outside the US.
The map today is great but the map tomorrow will be even better. It can be intimidating and difficult to know what work “needs” to be done. There are a million things to do and lots of energy to do them. Over the past two years I have facilitated numerous efforts to help folks direct their energy.
Some projects I will be working on this coming year to achieve a better map:
Please reach out if you have questions/concerns/just want to chat, I love to collaborate with and support the goals of other mappers. Nothing is immovable when we all work together.
2025年1月自從參加完台灣維基媒體的年度大會後,我就想說找時間來完善馬祖的地理資訊和百科條目內容。
complete guide of uploader https://help.mapillary.com/hc/en-us/articles/360020825811-Mapillary-Desktop-Uploader-the-complete-guide
Insta 360 X3 (但我用X2) https://help.mapillary.com/hc/en-us/articles/11951588568604-Insta360-X3
Today I will talk about the new version 0.8 of userscript that adds several useful features to osm.org.
You can view existing features of the script and install it on GitHub or in the OSM Wiki
My first PR was merged to openstreetmap-website code, and now the Map Data layer loads instantly. Be sure to try it. Thanks to the maintainers for help!
For the script, this means a huge acceleration in rendering large relations and GPS tracks (yes, tracks, read more!):
icon 👥 near relations is clickable 😉
Now they are displayed directly on the website. In notes, in tag history, and in changesets.
Finally, by opening the starting coordinate of the track, you will be able to see the entire GPX track on the map:
Let me remind you that the Z
key allows you to fit an object on the screen.
The track is also drawn when opening notes from StreetComplete
The rendering of tracks is still very simple, if you have any examples of successful visualizations, please share them.
Now you don’t have to open the changesets to understand if there is an interesting discussion or if this is another revert of vandalism.
by the way, long links like *.openstreetmap.org replaced by *.osm.org
upd: this feature was recently renamed to Followings. Now there are no friends in OSM (:
Did you know that OSM has a friends feature? You can even follow their edits here: https://osm.org/history/friends
better-osm-org can now filter changesets on this page. The script also tries to show around the username that it’s your friend.
Let’s be friends :) https://osm.org/user/TrickyFoxy/follow
Satellite images could be switched using the S
key, or using the button on the side of the note.
But what about Firefox for Android? And how do I find out about S? Now there is a button in the right panel for this.
Pressing the Shift
key will show you the ESRI Beta images. But be patient, because they take longer to load.
If we can now quickly render data, then why not be able to open it from files? Just drag them onto the map:
p.s. for now, it’s more of a PoC, feel free to suggest your ideas.
Where without them? :)
`
— hide the geometry of the changeset from the mapT
— toggle between compact and full tags diff modeF
— filter changesetsH
— user changesets history, T
— tracks, D
— Diary, C
— comments, N
— notes)U
— open user profile from changeset, note, …Shift
+ U
— open your profileBy the way, do you know about the new feature of the site? https://www.openstreetmap.org/user/SomeoneElse/changeset_comments
As well as about the previous versions that I wrote about on the forum.
so that you have the necessary link from the wiki at hand:) Buttons for commenting on changeset will only add text to the input field, without submitting.
In order not to clutter the map, relations are shown only on hover. Now the relation can be pinned.
Now, when you click ⏬ on the way history page, all changes will be displayed, including in the tags and coordinates of nodes of the way
If there are too many versions, you will see the versions filter. The K/L combination also works, but I recommend you try it ScrollAnywhere to navigate through the map with the middle mouse button
⚠️ there may be problems if some nodes are hidden by moderators.
Other
https://taginfo.openstreetmap.org/relations/waterway#roles https://taginfo.openstreetmap.org/reports/key_lengths#keys
p.s. The script also fixes broken links to Overpass Turbo on https://taginfo.geofabrik.de, if there is a space in the country or region name.
Bypass Redactions (so far only tags) (even for edits before 2012! Data that is hidden is uploaded to GitHub https://github.com/deevroman/better-osm-org/issues/54
shift
+ L
— go to your locationshift
+ N
— create new note (if some object was opened, his link wound prefilled in note text)As more satellite imagery has become openly available, efforts in mapping the Earth’s surface have accelerated. Yet the accuracy of these maps is still limited by the lack of in-situ data needed to train machine learning algorithms. Citizen science has proven to be a valuable approach for collecting in-situ data through applications like Geo-Wiki and Picture Pile, but better approaches for optimizing volunteer time are still required. Although machine learning is being used in some citizen science projects, advances in generative Artificial Intelligence (AI) are yet to be fully exploited. This paper discusses how generative AI could be harnessed for land cover/land use mapping by enhancing citizen science approaches with multi-modal large language models (MLLMs), including improvements to the spatial awareness of AI.
![]() |
Visual interpretation tasks undertaken by ChatGPT for (a) a wetland/mangrove landscape in South America (b) an agricultural area in central Europe. |
![]() |
Integrating multi-modal Large Language Models (MLLMs) in a citizen science visual interpretation workflow. |
See, L., Chen, Q., Crooks, A., Bayas, J.C.L., Fraisl, D., Fritz, S., Georgieva, I., Hager, G., Hofer, M., and Lesiv, M., Malek, Ž., Milenković, M., Moorthy, I., Orduña-Cabrera, F., Pérez-Guzmán, K., Schepaschenko, D., Shchepashchenko, M., Steinhauser, J.and McCallum, I. (2025), New Directions in Mapping the Earth’s Surface with Citizen Science and Generative AI, iScience, doi: https://doi.org/10.1016/j.isci.2025.111919. (pdf)
Épicerie coopérative
Geomob London took place at 6:00 PM on Thursday the 30th of January, 2025 at Geovation Hub at (Sutton Yard, 65 Goswell Rd, London EC1V 7EN)
Our format for the evening will be as it always has been:
doors open at 18:00, set up and general mingling
at 18:30 we begin the talks with a very brief introduction
Each speaker will have slides and speak for 10-15 minutes. After each talk there will be time for 2-3 questions.
We vote - using Feature Upvote - for the best speaker. The winner will receive a SplashMap and unending glory (see the full list of all past winners).
We head to a nearby pub for discussion and #geobeers paid for by the sponsors.
Michael Dales, Geospatial to save the planet: assessing tropical forest restoration projects using all the data
Helen Mazalon, Satellite and Seeds: Helping the most vulnerable in northern Uganda with GIS
Please volunteer to speak at future events.
Geomob London is organized by Ed Freyfogle and Steven Feldman
Geomob would not be possible without speakers and sponsors. Over the years we have had so many fantastic talks, spanning the range from inspirational to informative to weird and wacky. See the list of all the past speakers. Please get in touch if you would like to speak at a future Geomob.
Merhaba arkadaşlar, Antalya bölgesini düzenleyeceğim. Harita API’sinde koordinat ve isim ekleme özelliği var mı? Eğer varsa, çok basit bir şekilde tüm ülkeyi Google Places ile eşleştirebilirim.
Grounding lines are the boundaries where glaciers and ice sheets transition from resting on solid ground to floating on seawater.
The post Understanding Glacier Grounding Lines appeared first on Geography Realm.
I found myself sitting in a hospital talking to a doctor: Doc: You sure you haven’t had a heart attack? Me: I’m pretty sure…..Wouldn’t I know? Doc: Well……Yes and no. It was a conversation I didn’t want to have but there I sat having it. I didn’t know this was going to turn into the […]
The post Once upon a time appeared first on North River Geographic Systems Inc.
Geomob Edinburgh was held at 6pm on Tuesday, January 28th, 2025.
at The Melting Pot at 15 Calton Rd, Edinburgh EH8 8DL (Google Map,OpenStreetMap)
Our format for the evening will be:
doors open at 18:00, set up and general mingling
at 18:30 we begin the talks with a very brief introduction
Each speaker will have slides and speak for 10 minutes. After each talk there will be time for 2-3 questions.
We head to a nearby pub for discussion and #geobeers paid for by diagonalWorks and OpenCage.
We are always looking for speakers, volunteer to speak!
Geomob Edinburgh is organized by Gala Camacho
Geomob would not be possible without speakers and sponsors. Over the years we have had so many fantastic talks, spanning the range from inspirational to informative to weird and wacky. See the list of all the past speakers. Please get in touch if you would like to speak at a future Geomob.
The new Abstract Specification Topics define fundamental concepts and operations for Coverages, the digital representations of varying phenomena over a specific spatiotemporal extent.
The post OGC approves two new Abstract Specification Topics concerning Coverages appeared first on Open Geospatial Consortium.
A goal for me this year is to ‘ship more’, so in the spirit of releasing early and often I wanted to share a little new project I got going this past weekend. See https://github.com/cholmes/geoparquet-tools.
It’s a collection of utilities for things I often want to do but that aren’t trivial out of the box with DuckDB. It started focused on just checking GeoParquet files for ‘best practices’, which I’ve been working on writing up in this pull request, as I realized that lots of people are publishing awesome data as GeoParquet but don’t always pick the best options (and the tools don’t always set the best defaults). So it can check compression, if there’s a bbox column, and row group size. It also attempts to check if a file is spatially ordered, but I’m not sure if it works across different types of approaches. It does seem to work with Hilbert curves generated from DuckDB.
I do need to refine the row group reporting a bit — I think the row group size in bytes is more important than the number of row groups, but I want to try to gather more information about what’s optimal there.
From there I made a utility to do the Hilbert ordering that DuckDB can do. This was initially just for conveinence, so I could just make a one line call instead of remembering the complex SQL statement. But then I realized that it has some real utility as DuckDB still doesn’t pass through projection information, so if you run the Hilbert DuckDB command on projected data the output isn’t so useful. So I made the output utilize the input parquet metadata. It also writes things out with the best practices I’m checking for, including adding a bbox column. I’m hoping to make it easier to turn that on/off, and to also pull out CLI commands that can run the full formatting or any part of it, but it’s proven a bit trickier than I was hoping.
The other main functionality I was to make it easier to create the ‘admin-partitioned’ GeoParquet distributions that I blogged about awhile ago. I got excited about these, but then they didn’t seem to go anywhere. But I think there are some places it can be quite nice, and I want to try it on this Planet dataset of ML generated field boundaries for all of Europe. So I decided to build a utility that’s a bit more generic.
Matthias Mohr and Ivor Bosloper put together this great administrative division extension for fiboa:
I’ve been thinking a lot about pulling things out of fiboa / STAC to just be a part of the general GeoParquet ecosystem, and this one seems like a perfect one to start with. It has a real practical utility, as once you add these codes you can then split your files by them to partition them spatially.
I did it as two commands, one to add the column (gt add admin-divisions
) and then one to split based on the column (gt partition admin
). It’s just countries for now, but I hope to add subdivision_code. And it’s just based on Overture, but I also hope to make it so the definition of the admin boundaries is flexible and configurable.
My hope is to add more partitions to the CLI, like the ones Dewey discussed in his post on Partitioning strategies for bigger than memory spatial data. And also hoping to get in more ‘sort’ options as well, and also expand the gt add
sub-command to perhaps add h3, s2, geohash, etc. and to also add the bounding box column to any file (I built it into the hilbert sort, so just need to get it fully working and extract it out).
I was hoping to get to create my own first proper pypi package so I could let people pip install geoparquet-tools,
but I ran out of time for this round. I hope to do it soon, and to also add proper tests. And then my further hope is to also distribute at least a subset of this functionality as a QGIS plugin, and/or incorporate in my geoparquet downloader plugin, so people can easily check out how well remote parquet files follow the best practices.
The post Tech Fellow Update: Exploring Field Boundary Data with LLMs appeared first on Taylor Geospatial Engine.
DuckDB continues to be my go to tool for geospatial processing, after I discovered it over a year ago. Since that time its functionality has continued to expand, and as of version 1.1 it reads and writes GeoParquet natively, as long as you have the spatial extension installed.
LOAD spatial; CREATE TABLE fields AS (SELECT * from 'https://data.source.coop/kerner-lab/fields-of-the-world-cambodia/boundaries_cambodia_2021.parquet'); COPY fields TO 'cambodia-fields.parquet';
Be sure to always run LOAD spatial;
or the table won’t get a geometry column, it will just create blobs. If you see errors or your output data is just Parquet and not GeoParquet that’s likely the source of your problems. I often forget to add it at the beginning of my sessions — perhaps there is some nice way to configure DuckDB to always load it, but I don’t know it (yet).
I also do recommend that you always use zstd compression, as it generally results in at least 20% smaller files, and its speed is comparable to snappy.
COPY fields TO 'c-fields.parquet' (FORMAT 'parquet', COMPRESSION 'zstd')
DuckDB’s GeoParquet writer always includes the new bounding box column, which enables much faster spatial filtering. If you are translating GIS data from any format with a spatial index (GeoPackage, FlatGeobuf, Shapefiles) into DuckDB then you don’t need to do anything additional. But sometimes you get data that is not spatially ordered at all. Previously I would write the data out from DuckDB and use another tool to order it, but now the ST_Hilbert function can be used to order your data.
I recently got help on the DuckDB Spatial discussions for how to properly do this, so wanted to write that up for everyone. I’ve been processing Planet metadata that gets served from Planet’s Data API, working to try to make a STAC-GeoParquet version of it. The data is ordered by time, so when you load the full dataset it just fills in everywhere.
I had a false start with the Hilbert curve function, which resulted in a cool pattern of loading the data.
Unfortunately the resulting ordering isn’t all that helpful to optimize spatial queries.
After Max, the author of the DuckDB spatial extension, explained the importance of the ‘bounds’ argument, I was able to get much better results:
So I’d recommend if you are using the ST_Hilbert function that you always include the bounds. For a global dataset like mine you can just do something like:
CREATE TABLE ps_ordered AS SELECT * FROM ps ORDER BY ST_Hilbert(geometry, ST_Extent(ST_MakeEnvelope(-180, -90, 180, 90)));
You can just order as you write the Parquet:
COPY (SELECT * FROM ps ORDER BY ST_Hilbert(geometry, ST_Extent(ST_MakeEnvelope(-180, -90, 180, 90)) TO 'ps-sorted.parquet' (FORMAT 'parquet', COMPRESSION 'zstd');
But it can be a pretty intensive operation on larger datasets, so I like to make the table and then write it out separately.
One cool thing is that proper ordering can help the size of the data, by enabling better compression. The original data was 1.37 gigabytes, and I believe was ordered by time. The badly ordered one was 2.21 gigabytes, and then the properly ordered one was only 1.24 gigabytes.
If your dataset is not global then you can use DuckDB to get the bounds of the dataset with a call like:
SELECT st_extent(ST_Extent_Agg(COLUMNS(geometry)))::BOX_2D
You would have to save that call’s output somewhere — if you’re writing code that calls DuckDB you can just store it in your code, or you could use the bounds and then paste in to MakeEnvelope. Or you can try to do it all in one call — I’ve not tested extensively, but I believe this call should work (credit due to ChatGPT for this one):
SELECT * FROM ps ORDER BY st_hilbert( geometry, ( SELECT st_extent(ST_Extent_Agg(COLUMNS(geometry)))::BOX_2D FROM ps ) );
You can use that to create the table, or to directly write the data out.
I hope this post helps others, and soon gets into the LLM’s. A big thanks to Max for all his amazing work on the spatial extension, and helping me figure out how to get the Hilbert curve working!
Gratuitous Picture to use in story profile.
OGC invites developers and other contributors to the next Open Standards Code Sprint, to be held from March 25-27, 2025.
The post Registrations Open for the next OGC Code Sprint appeared first on Open Geospatial Consortium.
You must be logged into the site to view this content.
Die GRASS GIS-Community würdigt die langjährigen Beiträge von Roger Bivand zur Entwicklung des rgrass-Pakets.
The post Großer Dank an Roger Bivand! appeared first on Markus Neteler Consulting.
We are thrilled to announce the development of the BNG Co-Pilot, an innovative platform for Biodiversity Net Gain (BNG) assessment supported by the Taylor Geospatial Institute (TGI) and Amazon Web Service (AWS). This groundbreaking initiative applies Generative AI and advanced geospatial analytics to address the complexities of ensuring that land development projects in the UK …
Introducing BNG Co-Pilot Read More »
The post Introducing BNG Co-Pilot appeared first on Sparkgeo.
The PostGIS Team is pleased to release PostGIS 3.5.2.
This version requires PostgreSQL 12 - 17, GEOS 3.8 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. SFCGAL 1.4+ is needed to enable postgis_sfcgal support. To take advantage of all SFCGAL features, SFCGAL 1.5+ is needed.
Cheat Sheets:
This release is a bug fix release that includes bug fixes since PostGIS 3.5.1.
The post Transforming Agriculture in Latin America with Geospatial Innovation appeared first on Taylor Geospatial Institute.
You must be logged into the site to view this content.
You must be logged into the site to view this content.
This QGIS tutorial guides you through pan sharpening Landsat imagery by combining the 15-meter panchromatic band with lower-resolution multispectral bands.
The post Pan Sharpen Landsat Imagery in QGIS appeared first on Geography Realm.
Last December, the OGC@30 Anniversary Celebration and OGC Innovation Days DC 2024 brought together geospatial experts under the theme Spatial Data Infrastructure (SDI) Futures.
The post Overture Maps Foundation at OGC Events 2024: Championing Open Data and Interoperability appeared first on Open Geospatial Consortium.
I did a lot of reading last year, a lot, perhaps because I had a lot of down time. I tend to read before going to sleep, and recovery from surgery and other things means I go to bed early and then fill the time between bed and sleep with books. Books, books, and more books.
To be totally precise, I read books on a Kindle, which allows me to read in the middle of the night in the dark with the back light. Also to read from any position, since all books are the same, light weight when consumed via an e-reader. I am a full e-reader convert.
Anyway, I’ve had means, motive and opportunity, and I read a tonne. Some of it was bad, some of it was good, some of it was memorable, some not. Of the 50 or so books I read last year, here are ten that made me go “yes, that was good and memorable”.
Demon Copperhead, Barbara Kingsolver
I used to read Booker Prize winners, but I found the match to my taste was hit-and-miss. The Pullitzer Prize nominees list, on the other hand, has given me piles of great reads. I am still mining it for recommendations, older and older entries.
Anyways, this modern day re-telling of Dicken’s David Copperfield is set in Apallacia, amid the height of the opiod crises. The book is tightly written, has some lovely turns of phrase, and a nice tight narrative push, thanks to the borrowed plot structure. I re-read the Dickens after, because it was so much fun to mark out the character borrowings and plot beats.
Master Slave, Husband Wife, Ilyon Woo
This non-fiction re-telling of an original slavery escape narrative is occasionally verbose, but an excellent entrant into a whole category of writing I did not know existed, the contemporaneous slavery escape narrative. For obvious reasons, abolitionists before the Civil War were keen to promote stories that humanized the people trapped in the south, who might otherwise be theoretical to Northern audiences.
The book re-tells the escape of Ellen and William Craft, and wraps that story in a lot of historical context about the millieu they were escaping from (Georgian slavery) and to (abolitionist circles in the North). The actual text of their story is liberally quoted from, but this is a re-telling. Frederick Douglass appears in their story, which gave me the excuse I have been waiting for a long time to read the next book in this list.
Narrative of the Life of Frederick Douglass
It took me way too long to finally pick up this book, given that Douglass has showed up as such an important figure in the other historical books I have read: Team of Rivals, Memoirs of Ulysses S. Grant, And There Was Light.
One goes into books from the 1800s wondering just how punishing the language is going to be. Clauses upon subclauses upon subclauses? None of that here. Douglass writes wonderfully clean prose the modern mind can handle, and tells his story with economy but still enough context to make it powerful. Probably because as a master story teller, he was pitching for an audience much like the modern one – made up of people with little knowledge of the particulars of the slave system, just a broad and overly simple sense of the injustice. After 150 years, still devestating and accessible.
How Much of These Hills Is Gold, C Pam Zhang
The Goodreads crew does not seem to think this book is as good as I do, but what strikes me about it and what makes me slot it into my “years best” is that I remember it so clearly. This is a historical novel of the California gold rush, from the eyes of children born to Chinese immigrants in the gold fields. It’s both an intense family drama, and an meditation on the power of place. It left me with a strongly remembered sense of the land, and the characters. Even though it covers a big swathe of years, the cast of characters remains small and their interactions meaningful. It’s memorable!
(Also, and this is no small thing, I read Into the Distance by Hernan Diaz this year too, which is set in the same time period and has some of the same beats… so maybe these books are a pairing.)
Julia, Sandra Newman
It’s a great time to be reading about authoritarianism! In the same spirit as pairing up Demon Copperhead with David Copperfield, I also paired up a reading of George Orwell’s 1984 with this retelling of the same story from the point of view of Julia, the love interest in Orwell’s book.
Newman takes the opportunity to flesh out Julia as a character and also the world of 1984 a little more, which makes the re-read of the original really fun. I do not think I noticed before just how much Winston Smith is a self-absorbed schmuck, but once you’ve seen it, you cannot unsee it.
The Bee Sting, Paul Murray
A tragedy told from the inter-leaved view points of four members of a family falling apart. Each chapter from a different character, each builds up the point of view narrator and also illuminates the others. Mostly the reveal is who these people are, bit by bit, but the plot also slowly clicks together like a puzzle until that last piece slides in, and oh boy.
An easy engaging read that gets more and more intense, but you cannot look away.
Yellowface, R F Huang
Written by an Asian-American author, about a white author appropriating the story of an Asian-American author, the story is gripping, snarky, and unblinking in its takedown of the publishing industry. Come for the plot, stay for the commentary on modern meme-making and self-promotion, the intersection between who we are and who we present ourselves as. On the internet, nobody knows you are a dog. Or everybody knows you are a dog and hates you for it.
The Librarianist, Patrick deWitt
I don’t think this book made many or any “best of” lists, so it is not clear to me what caused me to read it, but it was a treat. Just a very quiet story about an introverted retired librarian, finding his way as he transitions into retirement, and builds some new connections with his community. Sounds really boring, I know, but I hoovered it up and it still sticks with me. A good read if you need some optimism and calm in your life.
Say Nothing, Patrick Radden Keefe
A history of the Troubles in Ireland, wrapped around the story of a particular murder, long unsolved, that slowly reveals itself over the decades, as the perpetrators come to terms with their part in that violent chapter of history. The Goodreaders really like this one and I agree. I knew the bare minimum of this chapter of world history (what I gleaned from CNN at the time, and from Derry Girls more recently) and this telling makes an easy introduction, covering a wide sweep of time and context.
Notes from the Burning Age, Claire North
Claire North remains a lesser-known science fiction author, despite her low-key hit The First Fifteen Lives of Harry August (read it!), but I’m a convert, and this novel reminded me why. The world is a post-climate crisis culture that has achieved some spiritual and technological balance with the ecology, but is wrestling with the return of what we would describe as “business as usual” – the subjugation of the natural world to the needs of humans.
Following an ecological monk, turned spy, from inside the capital of the new humanists, through the other realms of this world is easy because the journey is wrapped in a high-stakes espionage story. Of all the climate stories I have read lately, this one taken from such a long distance in the future speaks to me most. I want to think we will build something new and better, and while I know our human nature can be malign, I also know it can be beautiful.
Trust, Hernan Diaz
Best for last. Told in multiple sections from multiple perspectives in multiple styles, every narrator is unreliable, each in their own way, but the idea that there is a kernel of truth lying beneath it all never goes away (and yet, is never truly revealed). Perhaps a perfect book club novel for that reason. (Not where I got it, it’s another Pullitzer winner.)
Some facts everyone agrees on. There is a very rich and powerful financier. He has a relationship with a woman who he marries who is very important to him. But in what way? Unclear. And man is malign, but in what ways? The usual mercenary ones you might expect of a Wall Street lion? Worse and additional ways? Unclear. The whole thing is a puzzle box, the language, the characters, the events. Read it. Read it again. Read it a third time.
The civil society organization AfroLeadership, based in Cameroon and serving a wide swath of sub-Saharan Africa, has become a member of OGC.
The post AfroLeadership Joins OGC to Support Advocacy for Transparency, Accountability, and Citizen Participation in Public Policies appeared first on Open Geospatial Consortium.
Enhance spatial detail in multispectral images with pan sharpening. Learn how this GIS technique combines data for sharper, more detailed satellite imagery.
The post Pan Sharpening in GIS appeared first on Geography Realm.
You must be logged into the site to view this content.
Thursday, 20 February, the afternoon at 16:30–19:30
Tampere University of Applied Science premises (Kuntokatu 3, Tampere)
Sign up here by 17th of February
Our format:
Doors open at 16:30 for set up and general mingling over refreshments
At 17:00 we begin the talks with a very brief introduction
Each speaker will have slides and speak for 10-15 minutes. Talks will be in English. After each talk there will be time for 2-3 questions.
After the presentations, the event continues with networking and mingling.
Mikko Vesanen, Novatron – Use of geospatial data in Novatron applications
Teijo Meriläinen, Kelluu – Redefining Geospatial Data with Kelluu Airships
Markus Hohenthal, Lentola Logistcs - Using Location Information in Drone Logistics
Ilpo Tammi, Ubigu – Power of GIS
Geomob Finland, Tampere is organized by Tampere University of Applied Science in co-operation with Location Innovation Hub and SIX.
Geomob would not be possible without speakers and sponsors. Over the years we have had so many fantastic talks, spanning the range from inspirational to informative to weird and wacky. See the list of all the past speakers. Please get in touch if you would like to speak at a future Geomob.
Isostatic rebound is the Earth's slow rise after glaciers melt, reshaping coastlines, revealing landforms, and altering sea levels globally.
The post Isostatic Rebound: How Earth’s Surface Rises after Glaciers Retreat appeared first on Geography Realm.
A groundbreaking study recently published in Nature Mental Health conducted in collaboration with the TReNDS Center at Georgia State University, New Light Technologies, Inc. (NLT), and researchers from around the world reveals how urban features such as built environments, nighttime light emissions, and vegetation significantly influence children's brain development, cognition, and mental health. This pioneering research combines satellite-derived environmental data with advanced neuroimaging to provide novel insights into the intricate relationship between urban living and young minds.
Geomob Lisbon till take place on the evening of Wednesday, March 19th, 2025 at Startup Lisboa, Rua da Prata 80, 1100-420 Lisbon (Google Maps, OpenStreetMap). Doors open at 17:30 and talks will begin at 18:00 Doors open at 18:00 and talks will begin at 18:15.
Doors open at 18.00, set up and general mingling
Talks begin at 18:15 with a very brief introduction
Each speaker will have slides and speak for 10-15 minutes. After each talk there will be time for 2-3 questions.
We vote the best speaker. The winner will receive the best speaker prize and unending glory (see the full list of all past winners).
Discussion and #award and #geobeers paid for by the sponsors.
Speaker volunteers are always welcome
Geomob Lisbon (GeomobLX) is organized by Miguel Marques and Joana Simoes.
Geomob would not be possible without speakers and sponsors. Over the years we have had so many fantastic talks, spanning the range from inspirational to informative to weird and wacky. See the list of all the past speakers. Please get in touch if you would like to speak at a future Geomob.
Please share the event details with everyone you know who may find the evening interesting.
If you can’t attend (or even if you can) be sure to sign up fory the monthly Geomob mailing list, where we announce upcoming events.
Geomob Edinburgh x Edinburgh Earth Observatory Seminars will be held at 5:30pm on Friday, March 28th, 2025.
at The ECCI at Edinburgh Climate Change Institute, High School Yards, Edinburgh EH1 1LZ (Google Map,OpenStreetMap)
Our format for the evening will be:
doors open at 17:30, set up and general mingling
at 18:00 we begin the talks with a very brief introduction - some talks brought by Geomob some talks brought by EEO
Each speaker will have slides and speak for 10 minutes. After each talk there will be time for 2-3 questions.
We head to a nearby pub for discussion and #geobeers sponsored by OpenCage and Other sponsor?!.
To be announced.
We are always looking for speakers, volunteer to speak!
Geomob Edinburgh is organized by Gala Camacho
Geomob would not be possible without speakers and sponsors. Over the years we have had so many fantastic talks, spanning the range from inspirational to informative to weird and wacky. See the list of all the past speakers. Please get in touch if you would like to speak at a future Geomob.
Geomob Berlin will take place at 18:00 on Wednesday the 4th of June, 2025. Location to be announced.
Our format for the evening will be as it always has been:
doors open at 18:00, set up and general mingling
at 18:30 we begin the talks with a very brief introduction
Each speaker will have slides and speak for 10-15 minutes. After each talk there will be time for 2-3 questions.
We vote - using Feature Upvote - for the best speaker. The winner will receive a SplashMap and unending glory (see the full list of all past winners).
We head to a nearby pub for discussion and #geobeers paid for by the sponsors.
Georg Held, Community Mapping for Safe Roads to School
Evgenii Burmakin, Dawarich and how it uses geospatial services and tools
Dr. Alana Belcon, Maps - Should we trust them?
Maik Busch, VulkanMaps, a GPU based Rendering Engine for OpenStreetMap
More to be announced. Volunteers needed.
Geomob Berlin is organized by Peter Rose and Ed Freyfogle
Geomob would not be possible without speakers and sponsors. Over the years we have had so many fantastic talks, spanning the range from inspirational to informative to weird and wacky. See the list of all the past speakers. Please get in touch if you would like to speak at a future Geomob.
The GRASS GIS community recognises the long-term contributions of Roger Bivand for the development of the rgrass package.
The post A big thank you to Roger Bivand! appeared first on Markus Neteler Consulting.
As you create your 2025 budgets, we invite you to join us as a sponsor at CNG Conference 2025 in Snowbird, Utah from April 30 to May 2, 2025. Sponsorship amplifies your organization’s visibility and aligns you with the innovators and leaders driving the future of geospatial data.
CNG Conference 2025 is designed to foster collaboration, innovation, and growth within the geospatial community, with sessions organized around four main tracks: On-ramp to Cloud-Native Geospatial Data, The Bridge Between Science and Technology, Technically Advancing Cloud-Native Geospatial, and Enabling Interoperability. These tracks will guide attendees in exploring foundational skills, interdisciplinary collaboration, technical advancements, and best practices, ensuring a comprehensive experience for professionals across the geospatial field. Through keynotes, workshops, and networking opportunities, the conference aims to advance knowledge sharing, career development, and community growth.
Sponsorship for the CNG Conference 2025 requires a base sponsorship of $5,000. Then, sponsors can select add-ons to amplify their presence and demonstrate leadership in the geospatial community by offering targeted branding, speaking engagements, and direct engagement with attendees.
To learn more about sponsorship opportunities, please visit 2025-ut.cloudnativegeo.org/sponsor.
The post Taylor Geospatial Institute Announces Next Round of Planet Fellowship Program appeared first on Taylor Geospatial Institute.
Themed 'AI for Geo,' the meeting provided members with insights into what’s happening at OGC and how interoperable technologies are critical for tackling global challenges.
The post A Recap of the 130th OGC Member Meeting, Goyang, Korea appeared first on Open Geospatial Consortium.
The U.S. Census Bureau has officially released its final data product from the 2020 Census: the Supplemental Demographic and Housing Characteristics File (S-DHC). This release marks a significant milestone, as it brings a wealth of detailed information about households and the people living in them across the United States, Puerto Rico, and the District of Columbia.
OGC’s newest Principal Member, BNETD, assists the Government of Côte d'Ivoire and other countries in eastern sub-Saharan Africa with major economic development projects.
The post Côte d’Ivoire’s BNETD, a Key Player in Economic Development, Joins the Open Geospatial Consortium’s Global Community of Experts appeared first on Open Geospatial Consortium.
The high-profile awards recognize outstanding individuals and organizations for their exceptional contributions to the community.
The post OGC Honors Geospatial Leaders and Dedicated Contributors at OGC@30 Anniversary Event appeared first on Open Geospatial Consortium.
OGC API – Moving Features provides a standard way to manage and interact with geospatial data representing phenomena and objects that move and change over time.
The post OGC Membership approves OGC API – Moving Features – Part 1: Core as an official OGC Standard appeared first on Open Geospatial Consortium.
The post TGI Challenge: Harnessing Geospatial Innovation for a more food secure future appeared first on Taylor Geospatial Institute.
The post TGI is proud to join the Cloud Native Geospatial Forum appeared first on Taylor Geospatial Institute.
Kyle Barron, Cloud Engineer at Development Seed.
I have a bit of a nontraditional background; I have virtually no official training in geography or computer science. In college, I was interested in urban and environmental economics, trying to understand how policies shape cities and the environment. I planned to pursue a PhD in economics and after college worked for a health economics professor at MIT for two years.
In that time I learned data analysis skills, but more importantly, I learned that I preferred data analysis and coding to academic research. I decided not to pursue a PhD and left that job to hike the Pacific Crest Trail, a 2,650-mile hiking trail from Mexico to Canada through California, Oregon, and Washington. Over five months of hiking, I had plenty of time for reflection and decided to try to switch to some sort of career in geospatial software or data analysis.
I didn’t particularly want to go back to school and had plenty of ideas to explore, so I decided to build a portfolio of projects and hopefully land a job directly. For seven months, my learning was self-directed in the process of making an interactive website dedicated to the Pacific Crest Trail. I learned core geospatial concepts like spatial reference systems and how to manage and join data in Python with GeoPandas and Shapely. I learned basic JavaScript, how to use and create vector tiles, and how to render basic maps online. All the content served from that website I figured out how to generate: OpenStreetMap-based vector tiles, a topographic map style, NAIP-based raster tiles, and USGS-based hillshade data and contour lines.
From the beginning, I was excited about the browser because of its broad accessibility. I wanted to share my passions with the non-technical general public. Everyone has access to a web browser while only domain specialists have access to ArcGIS or QGIS and know how to use them. Even today, if the Pacific Crest Trail comes up in conversation, I’ll pull up the website on my phone and show some of my photography on the map. Interactive web maps connect with the general public at a deeper level than any other medium.
These projects also led me to my first software job. As I was building my own applications, I also contributed bug fixes back to deck.gl and the recently-formed startup Unfolded offered me a job!
GeoParquet and GeoArrow are both ways to speed up handling larger amounts of geospatial data. Let’s start first with GeoParquet since that’s a bit more approachable to understand.
GeoParquet is a file format to store geospatial vector data, as an alternative to options like Shapefile, GeoJSON, FlatGeobuf, or GeoPackage. Similar to how Cloud-Optimized GeoTIFF (COG) builds upon the existing GeoTIFF and TIFF formats, GeoParquet builds upon the existing Parquet format for tabular data. This gives the GeoParquet format a head start because there are many existing libraries to read and write Parquet data that can be extended with geospatial support.
Three things excite me about GeoParquet: it’s cloud-native, so you can read relevant portions of the file directly from cloud storage, without downloading the entire file. It compresses very well and is fast to read and write, so even in non-cloud-native situations, it can speed up workflows. And it integrates well with other new technologies such as GeoArrow.
GeoParquet 1.1 introduced support for spatial partitioning. This makes it possible to fetch only specific rows of a file according to a spatial filter.
Note that spatial partitioning is a slightly different concept to spatial indexing. In a format like FlatGeobuf or GeoPackage that performs spatial indexing, it records the position of every single row of your data. This is often useful when you want to quickly access individual rows of data, but it prevents effective compression and becomes harder to scale with large data because the index grows very large. In contrast, spatial partitioning uses chunking, where instead of indexing per row, it records the extent of a whole group of rows. This provides flexibility and scalability when writing GeoParquet data. By adjusting the chunk size, the writer can choose the tradeoff between index size and indexing efficiency.
The upside of spatial partitioning is that an entire group of data is indexed and compressed as a single unit. The downside is that if you do want to access only a single row, you’ll have to fetch extra data. The sweet spot is when you want to access a collection of data that’s co-located in one spatial region. (If your access patterns regularly care about single rows of data, a file format like FlatGeobuf or GeoPackage might be better suited to your needs. GeoParquet is tailored to bulk access to data.)
GeoParquet also lets you fetch only specific columns from a dataset. If the dataset contains 100 columns but your analysis only pertains to the geometry column plus 3 other columns, then you can avoid downloading most of the file. For example, with stac-geoparquet, a user might want to find URLs to the red, green, and blue bands of Landsat scenes within a specific spatial area, date range and cloud cover percentage. This query is much more efficient with GeoParquet because it only needs to fetch these six columns, not all of the dozens of attribute columns.
GeoParquet also excels even when you don’t care about its cloud-native capabilities. It compresses extremely well on disk and is very fast to read and write. This leads GeoParquet to be used as an intermediate format for analysis, such as GeoPandas, where a user wants to back up and restore their data quickly.
For example, Lonboard, a Python geospatial visualization library I develop, uses GeoParquet under the hood to move data from Python to JavaScript. This isn’t a cloud-native use case, but GeoParquet is still the best format for the job because it compresses data so well and minimizes network data transfer.
GeoParquet is a file format; when you read or write GeoParquet you need to store it as some in-memory representation. This is where GeoArrow comes in. It’s a new way of representing geospatial vector data in memory and a way that’s efficient to operate on and fast to share between programs.
GeoParquet and GeoArrow are well integrated and have a symbiotic relationship, where the growing adoption of one makes the other more appealing. The fastest way to read and write GeoParquet is to and from GeoArrow. But GeoArrow is not strictly tied to GeoParquet: it can be paired with any file format.
For example, GDAL’s adoption of GeoArrow made it 23x faster to read FlatGeobuf and GeoPackage into GeoPandas. Historically, GeoPandas and GDAL didn’t have a way to share a collection of data at a binary level. So GDAL (via the fiona driver) would effectively create GeoJSON Python objects for GeoPandas to consume. This is horribly inefficient. Since GDAL 3.6, GDAL has supported reading into GeoArrow and since GDAL 3.8 it has supported writing from GeoArrow.
I’m excited for the potential of GeoParquet and related technologies to replace or complement standalone databases. Existing collections of large geospatial data might use a database like PostGIS, which requires an always-on server to respond to user queries and a developer who knows how to maintain that system.
While Parquet itself is a read-only file format, there are newer technologies, like Apache Iceberg, that build on top of Parquet to enable adding, mutating, and removing data. People are discussing how to add geospatial support to Iceberg. In the medium term, Iceberg will be an attractive serverless method to store large, dynamic geospatial data that would be difficult or expensive to maintain in PostGIS.
Moving data is difficult and expensive and interacting with the cloud from browser applications presents interesting problems of data locality. Indeed, both user devices and the cloud are getting more powerful, but network connectivity, especially over the public internet, isn’t improving proportionally fast. That means there will continue to be a gulf between local, on-device compute, and cloud compute.
We’ve seen this phenomenon lead to the advent of hybrid data systems that span local and remote data stores. New analytical databases like DuckDB or DataFusion and new data frame libraries like Polars have the potential to work with data on the cloud in a hybrid approach, where only the portions of data required for a given query need to be fetched over the network.
I conclude that in the future there will be geospatial, browser-based hybrid systems. Non-technical users will navigate to a website and connect to all their local and cloud-based data sources. The system will intelligently decide which data sources to download and materialize into the browser. And it will connect to the tech underlying Lonboard, efficiently rendering that data on an interactive map.
I’ve been making steady progress towards this goal. Through the WebAssembly bindings of my geoarrow-rust project, I’ve been collaborating with engineers at Meta to access Overture GeoParquet data directly from the browser. Downloading an extract from the website will perform a spatial query of Overture GeoParquet data directly from S3 without any server involved.
In general, I find “data boundaries” to present fascinating problems. Both local and cloud devices are powerful, but moving data between those two domains is very slow. When do you need to move data across boundaries; how can you minimize what data you need to move, and how can you make the movement most efficient when you actually do need to move it?
The specific experience with the largest non-technical influence on my career was my time hiking the Pacific Crest Trail. It piqued my interest in thinking about data for the outdoors and climate and gave me a strong base of motivation for learning how to work with geospatial data. I consider having motivation and ideas around what to create much more important than learning how to do something. There’s enough technical material on the internet to learn whatever you want. But you need the motivation to push yourself over that initial hump, the period where things just don’t make sense. It’s been really useful for my learning to have an end goal in mind that keeps me motivated to press on through the “valley of despair” of new concepts.
I’d also point to a video by prolific YouTuber and science communicator Tom Scott: “The Greatest Title Sequence I’ve Ever Seen.” It discusses an introductory sequence to a British TV show and notes how much attention to detail there was, even when virtually no one would notice all those details. I love this quote:
“Sometimes, it’s worth doing things for your craft, just because you can. That sometimes it’s worth going above and beyond, and sweating the small stuff because someone else will notice what you’ve done.”
I tend to be a perfectionist and strive for the best because I want it, not because someone else is asking for it. In my own career, I’ve noticed how attention to detail came in handy in unexpected ways. Where by going above and beyond in one project I learned concepts that made later projects possible.
I should also note that a collection of people online have been really strong influences. There are too many to name them all, but Vincent Sarago and Jeff Albrecht were two of the first people I met in the geospatial community and they’ve been incredible mentors. And the writings and code of people like Tom MacWright and Volodymyr Agafonkin have taught me so much.
Discover the inspiring stories of University of Maryland alumni who are now key players at New Light Technologies (NLT). This article celebrates their contributions to our projects and showcases the real-world impact of their academic training.
The AGU Fall Meeting 2024, the largest gathering for Earth and space science, starts this morning, Monday December 9-13 in Washington, DC. This year, there are many papers on cloud-native geospatial technologies by CNG members and other experts. This blog post highlights some key talks and posters you won’t want to miss.
Dynamic Tiling for Earth Data Visualization: This talk explores dynamic tiling, a method for generating map tiles on-the-fly, allowing for real-time modifications and eliminating the need for constant updates. Presented by Aimee Barciauskas from Development Seed. Learn more.
VirtualiZarr - Create Virtual Zarr Stores Using Xarray Syntax: This paper presents VirtualiZarr, a tool that allows accessing old file formats (like netCDF) as if they were stored in cloud-optimized formats (like Zarr). The authors will demonstrate using the Worthy Ocean Alkalinity Enhancement Efficiency Map dataset, which consists of ~40TB of data spread across ~500,000 netCDF files. Presented by Thomas Nicholas from Worthy, LLC. Learn more.
Supporting Open Science with the CF Metadata Conventions for NetCDF: This talk highlights the Climate and Forecast (CF) conventions, a critical standard for sharing and processing Earth science data in netCDF format and in Zarr/GeoZarr. Presented by Ethan Davis from NSF Unidata. Learn more.
Integrating Zarr with netCDF: Advancing Cloud-Native Scientific Data Interoperability with ncZar: This paper explores the integration of ncZarr with the netCDF ecosystem. The authors will discuss the current state of this effort, and what can be expected in upcoming netCDF releases. They will outline use cases, and discuss, in practical terms, how to use ncZarr as part of a cloud-based workflow which assumes the involvement of the netCDF data model. Presented by Ward Fisher from the University Corporation for Atmospheric Research. Learn more.
Transforming NASA Earth Observation Data With POWER: This paper explores NASA’s Prediction of Worldwide Energy Resources (POWER) project and its utilization of cloud-optimized formats like Zarr for delivering Earth observation data. Presented by Nikhil Aluru from ORCID, Inc, NASA Langley Research Center. Learn more.
A new sub-chunking strategy for fast netCDF-4 access in local, remote and cloud infrastructures: This paper presents a new strategy for faster access to netCDF-4 data in various environments. A comparison with the cloud-oriented formats Zarr and NCZarr is conducted. Presented by Pierre-Marie Brunet from Centre National d’Études Spatiales. Learn more.
Seamless Arrays - A Full Stack, Cloud-Native Architecture for Fast, Scalable Data Access: This talk will introduce a new approach to accessing Earth systems data called “Seamless Arrays,” which is the ability to easily query a data cube across both spatial and temporal dimensions with consistent low latency. The author will showcase their prototype implementation of Seamless arrays, inspired by cloud-native database systems like Snowflake, consisting of two primary components - a Zarr-based schema and an API layer. Presented by Joseph Hamman from Earthmover. Learn more.
Cubed - Bounded-Memory Serverless Array Processing in Xarray Cubed is a framework for processing large arrays, designed to be memory-efficient and scalable. It uses Zarr as a persistent storage layer and can run on various cloud platforms. The authors will demonstrate running Cubed in the cloud for various common geoscience analytics workloads. Presented by Tom White from Tom White Consulting. Learn more.
Pangeo-ESGF CMIP6 Zarr Data 2.0 - Streaming Access to CMIP6 data in the cloud that rocks!: This talk will introduce the new data, describe the ingestion architecture, reflect on both successes and challenges of this approach, and elaborate on future directions for Coupled Model Intercomparison Project (CMIP6+ and CMIP7). You will be convinced to use the cloud data for research! Presented by Julius Johannes Marian Busecke from LDEO/Columbia University. Learn more.
Leveraging GPU Acceleration for High-Resolution 3D Visualization of Earth Observation Data: This paper explores a workflow for leveraging GPUs to visualize Earth observation data in 3D, enabling rapid rendering and enhanced analysis capabilities. Presented by Navaneeth Rangaswamy Selvaraj from the University of Alabama in Huntsville. Learn more.
Earth Science Data Access and Discovery and the Cloud: Past, Present, and Future II: This poster explores the evolution of cloud computing for Earth science data access and discovery, along with best practices for maximizing cloud investment. Presented by Douglas J Newman from NASA Goddard Space Flight Center. Learn more.
How can we make cloud computing actually accessible to all scientists?: This talk explores the barriers hindering wider adoption of cloud computing in scientific research and proposes solutions for overcoming them. Presented by Ryan Abernathey from Earthmover. Learn more.
High-Performance Access to Archival Data Stored in HDF4 and HDF5 on Cloud Object Stores Without Reformatting the Files: This paper explores a NASA-sponsored technology, DMR++, which is designed to enhance access to large, historical datasets stored in older formats like HDF4 and HDF5. By leveraging cloud object stores, DMR++ enables efficient data access without the need for time-consuming reformatting. This technology optimizes data storage and access, particularly for massive datasets, and can even store calculated values directly within the data structure, eliminating the need for specialized software tools. Presented by James H R Gallagher from OPeNDAP, Inc. Learn more.
Exploring Innovation in Biodiversity Conservation Decision-Making Through Open Science and Generative AI: This talk explores how recent innovations in open science overcomes these barriers and creates opportunities to advance decision-making. The authors introduce a cloud-native geospatial visualization tool with chat-driven interfaces, showcasing how it leverages open data layers and generative AI. Presented by Cassidy Buhler from the University of Colorado at Boulder. Learn more.
TensorLakeHouse: A High-Performance, Open-Source Platform for Accelerated Geospatial Data Management with Hierarchical Statistical Indices This online poster introduces TensorLakeHouse, an open-source platform designed for high-performance geospatial data management and processing. Presenting author: Naomi Simumba from IBM Research. Learn more.
This list provides a starting point for exploring the exciting world of cloud-native geospatial at AGU 2024. Our blog is open source. If we are missing any talks or posters, you can suggest edits on GitHub.
Remember to check the conference website for the latest schedule and additional details. Happy exploring!
Initiative aimed at identifying and addressing critical problems in food security calls for subject matter and geospatial experts to shape its activities and collaborate on innovation projects.
The post Generative AI and the Geospatial Renaissance: A Call to Innovate appeared first on Taylor Geospatial Institute.
Tell Us About YourselfMy name is Antonia Blankenberg. Alongside being a drummer with the fantastic TBL8 Brass, I’m a Lead Consultant in Utilities with Esri Ireland and I’ve been working in GIS for 5 years now. I’ve always been interested in geography, but I only first came across GIS during my undergraduate degree. I took […]
The post Maps and Mappers 2024 – October – Antonia Blankenburg appeared first on GeoHipster.
As Geography Awareness Month 2024 has wrapped up, we reflect on the overwhelming success of DMV GIS Day, held on November 20. This inaugural virtual event brought together a dynamic and engaged audience to celebrate the transformative power of Geographic Information Systems (GIS) and the innovation driving progress across the District, Maryland, and Virginia. DMV GIS Day was a standout moment for the geospatial community with world-class speakers, thought-provoking panels, and an impressive turnout.
It feels like only yesterday I was typing “2024 Geohipster Calendar”……….. 2025 is here and has been “triple checked” for my sanity. The price increased just a bit to $18.00 dollars. There are warnings all over the order page about the Canadian Mail Strike so order appropriately if you are in Canadia. Link to Purchase! […]
The post 2025 Geohipster Calendar appeared first on GeoHipster.
You must be logged into the site to view this content.
We invite you to join us at CNG Conference 2025 in Snowbird, Utah from April 30 to May 2 2025.
Set against the beautiful backdrop of Snowbird, Utah, this inaugural event will convene the cloud-native geospatial community to learn from one another and collaborate to make geospatial data easier to access and use.
The event will include keynote speeches, panel discussions, hands-on workshops, networking opportunities, and showcases of open-source projects, all designed to enhance attendees’ skills and knowledge. Participants will explore the newest developments in cloud-native geospatial technology, data accessibility, and practical applications.
Save the date: April 30 - May 2, 2025
Where? Snowbird, Utah – about 40 minutes from Salt Lake City International Airport.
Sponsorships: We are developing sponsorship packages. If you are interested, email us at [email protected]
Interested in presenting? We will soon publish a call for proposals for presentations and workshops.
Interested in attending? Let us know if you want to attend by completing the form below:
The event’s venue is at Gaston Crommenlaan 4, 9000 Gent, at the TomTom offices. The exact entrance of the building is here. You take elevator 4s to the 5th floor, there the door should be open.
We will welcome everyone at 6:30 PM and aim to start the talks by 6:45 PM.
Our format for the evening will be:
6:30 PM: doors open, set up and general mingling
6:45 PM: we begin the talks with a very brief introduction
Each speaker will have slides and speak for 10-15 minutes. After each talk there will be time for 2-3 questions.
We head to a nearby pub for discussion and #geobeers paid for by the sponsors.
For this Geomob, we have our speakers. Would you like to speak on the next Geomob Belgium? Volunteer here
Geomob Belgium is organized by Ben Abelshausen and Han Tambuyzer
Geomob would not be possible without speakers and sponsors. Over the years we have had so many fantastic talks, spanning the range from inspirational to informative to weird and wacky. See the list of all the past speakers. Please get in touch if you would like to speak at a future Geomob.
As we count down to the inaugural DMV GIS Day on November 20, 2024, we are thrilled to announce that this event is not just about celebrating the power of Geographic Information Systems (GIS), but also recognizing the vital role GIS plays in transforming our communities. This year, DMV GIS Day has received a special proclamation from Mayor Muriel Bowser, honoring the event’s significance and acknowledging the profound impact of geospatial technologies on the Washington, D.C. , and the DMV region.
Every year, on November 11th, we come together as a nation to celebrate Veterans Day, honoring the brave men and women who have served in the United States Armed Forces. This day is more than just a holiday; it's an opportunity to recognize the sacrifices these individuals have made to protect our freedoms and ensure our safety. It's a time to express our deepest gratitude and reflect on the enduring values they bring back to our communities.
This week on Wednesday, November 13, the CNG Virtual Conference 2024 will gather data user practitioners, enthusiasts, and newcomers to explore the latest in cloud-native geospatial technology. Come hear keynotes from NASA, Carto, the University of Tennessee, and speakers from many other organizations sharing updates and insights on cloud-native geo. This online event is an inclusive space for anyone curious about cloud-native geospatial, whether you’re an industry expert, an innovator, or just starting to explore cloud-native concepts. We invite you to join us to learn, connect, and engage with a field that’s rapidly changing how we work with geospatial data.
Cloud-native geospatial represents a transformative approach to handling data. At this conference, you’ll get an inside view into how cloud-native technology makes geospatial data faster, more flexible, and more scalable. And then discover how this shift is driving innovation across the industry.
Your participation is important as we strive to help more data users adopt a cloud-native geospatial approach. By attending, you’re contributing to a growing movement to make geospatial data more accessible and useful for data users.
This conference offers a vendor-neutral space for practitioners across sectors to collaborate and exchange ideas. Whether you’re interested in learning or networking, this is the place to connect with other geospatial professionals.
Beyond learning, this event is about building a community dedicated to advancing geospatial data accessibility and usability. Join us and be part of a community-driven effort to innovate and improve geospatial technology.
We have a range of sessions lined up, covering both foundational and technical topics, organized into three key sections throughout the day:
We will begin the day with foundational topics, including a keynote from Javier de la Torre on breaking down GIS data silos with cloud-native technology. There will also be insights into how the U.S. government implements cloud-native geospatial solutions and demos of featured cloud-native datasets hosted on Source Cooperative.
The second session takes a closer look at the technical side of cloud-native formats with live demonstrations. Dr. Qiusheng Wu’s keynote will explore the transformative role of cloud-native technologies in modern GIS curricula, followed by demos from experts showcasing cloud-native geo solutions like VirtualiZarr, PMTiles, GeoParquet, and Icechunk.
The conference wraps up with a forward-looking session on the power of community in cloud-native geospatial. Dr. Brianna Rita Pagán’s keynote will address the importance of a strong CNG community, and Dana Bauer will lead a discussion on building and sustaining this community over time.
Don’t miss out on the opportunity to connect, learn, and contribute to the future of geospatial technology. Secure your spot today and be part of the growing cloud-native geospatial community. Explore the agenda and purchase tickets here.
A special thank you to our sponsors for their support in making the CNG Virtual Conference 2024 possible.
Follow us on LinkedIn and X for updates, and use #CloudNativeGeo2024 to join the conversation. For any conference-related questions, feel free to reach out at [email protected].
In numerous posts, we have been discussing synthetic populations and their use in agent-based modeling. But there are many modeling styles that also utilize synthetic populations. In our own work we often spend significant amounts of time creating such synthetic populations, especially those grounded with data, due to the time needed to collect, preprocess and generate the final synthetic population. To alleviate this, we (Na (Richard) Jiang, Fuzhen Yin, Boyu Wang and myself) have a new paper published in Scientific Data, entitled "A Large-Scale Geographically Explicit Synthetic Population with Social Networks for the United States." Our aim of this paper is to build and provide a geographically explicit synthetic population along with its social networks using open data including that from the latest 2020 U.S. Census which can be used in a variety of geo-simulation models.
![]() |
Summary of the Resulting Datasets. |
Specially, in the paper we outline how we created the a synthetic population of 330,526,186 individuals representing America's 50 states and Washington D.C.. Each individual has a set of geographical locations that represent their home, work or school addresses. Additionally, these individuals are not isolated, they are embedded in a larger social setting based on their household, working and studying relationships (i.e., social networks).
The work (e.g., data collection, data preprocessing and generation processes) was coded using Python 3.12 and all the scripts used are available at: https://github.com/njiang8/geo-synthetic-pop-usa while the resulting datasets (85 GB uncompressed) are available at OSF: https://osf.io/fpnc2/.
To give you a sense of the paper, below we provide the abstract to it, along with some results and our efforts to validate the synthetic population. While at the full reference and link to the paper can be found at the bottom of the post.
Abstract:
Within the geo-simulation research domain, micro-simulation and agent-based modeling often require the creation of synthetic populations. Creating such data is a time-consuming task and often lacks social networks, which are crucial for studying human interactions (e.g., disease spread, disaster response) while at the same time impacting decision-making. We address these challenges by introducing a Python based method that uses the open data including that from 2020 U.S. Census data to generate a large-scale realistic geographically explicit synthetic population for America's 50 states and Washington D.C. along with the stylized social networks (e.g., home, work and schools). The resulting synthetic population can be utilized within various geo-simulation approaches (e.g., agent-based modeling), exploring the emergence of complex phenomena through human interactions and further fostering the study of urban digital twins.
Keywords: Synthetic Population, U.S. Census 2020, Agent-Based Modeling, Geo-Simulation, Social Networks.
![]() |
Data Generation Workflow and Resulting Datasets. |
![]() |
A Sample of a Social Networks for one Household and their Home, Work and Educational Social Networks from the Generated Data. |
![]() |
Sample of Generated Social Networks Extracted from the City of Buffalo, New York: (a) Household; (b) Work; (c) School; (d) Daycare. |
![]() |
Validation of the Synthetic Population at Different Levels: (a) Population under Different 18 Age Groups; (b) Household under Different Household Types. |
Full Referece:
Jiang, N., Yin, F., Wang., B. and Crooks, A.T., (2024), A Large-Scale Geographically Explicit Synthetic Population with Social Networks for the United States, Scientific Data, 11, 1204. https://doi.org/10.1038/s41597-024-03970-1 (pdf)
One of the major outputs of Taylor Geospatial Engine’s first Innovation Bridge is the recently released Fields of The World dataset, also known as FTW. We wanted to take some time for a deep dive into the core idea, the various parts of the effort, and where things could go from here.
The post Introducing Fields of The World appeared first on Taylor Geospatial Engine.
New Light Technologies (NLT) is excited to continue our long-standing support of So Others Might Eat (SOME) by sponsoring this year’s Trot for Hunger event at an increased Advocate Sponsor level. This marks over a decade of partnership with SOME, during which we've been proud to contribute to their mission of providing essential services, including food, housing, healthcare, and job training, to those in need in Washington, DC.
OGC API – EDR – Part 2 defines a web interface for efficient event-driven data updates, employing a Publish-Subscribe Workflow for real-time notifications.
The post OGC Membership approves OGC API – Environmental Data Retrieval – Part 2: Publish-Subscribe Workflow as an official OGC Standard appeared first on Open Geospatial Consortium.
While in the past we have written about how we can use agent-based models to capture basic patterns of life, and even developed a simulations, but until now we have never really demonstrated how we go about this. However, at the SIGSPATIAL 2024 conference we (Hossein Amiri, Will Kohn, Shiyang Ruan, Joon-Seok Kim, Hamdi Kavak, Dieter Pfoser, Carola Wenk, Andreas Zufle and myslf) have a demonstration paper entitled "The Pattern of Life Human Mobility Simulation." in which we show:
If this sounds of interest, below we show the GUI to the model, along with the steps to generate a trajectory dataset or a new map for the simulation. At the bottom of the post you can actually see the papers full reference and a link to download it. While at https://github.com/onspatial/generate-mobility-dataset you can find the source code for the enhanced simulation and data-processing tools for you to experiment with.
Abstract:
We demonstrate the Patterns of Life Simulation to create realistic simulations of human mobility in a city. This simulation has recently been used to generate massive amounts of trajectory and check-in data. Our demonstration focuses on using the simulation twofold: (1) using the graphical user interface (GUI), and (2) running the simulation headless by disabling the GUI for faster data generation. We further demonstrate how the Patterns of Life simulation can be used to simulate any region on Earth by using publicly available data from OpenStreetMap. Finally, we also demonstrate recent improvements to the scalability of the simulation allows simulating up to 100,000 individual agents for years of simulation time. During our demonstration, as well as offline using our guides on GitHub, participants will learn: (1) The theories of human behavior driving the Patters of Life simulation, (2) how to simulate to generate massive amounts of synthetic yet realistic trajectory data, (3) running the simulation for a region of interest chosen by participants using OSM data, (4) learn the scalability of the simulation and understand the properties of generated data, and (5) manage thousands of parallel simulation instances running concurrently.
Keywords: Patterns of Life, Simulation, Trajectory, Dataset, Customization
![]() |
Steps to generate the one trajectory dataset. |
Full referece:
Amiri, H., Kohn, W., Ruan, S., Kim, J-S., Kavak, H., Crooks, A.T., Pfoser, D., Wenk, C. and Zufle, A. (2024) The Pattern of Life Human Mobility Simulation (Demo Paper), ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Atlanta, GA. (pdf)
In the paper we show how we can we utilize a method to create the geographically-explicit synthetic population along with capturing their social networks and how this can be used to study contagious disease spread (and various lineages of the disease) in Western New York. If this sounds of interest, below you can read the abstract from the paper, see some of the results and find the full reference and the link to the paper. While the model itself and the data needed to run it is available at https://osf.io/zrtuj/
Abstract
The COVID-19 pandemic has reshaped societies and brought to the forefront simulation as a tool to explore the spread of the diseases including that of agent-based modeling. Efforts have been made to ground these models on the world around us using synthetic populations that attempt to mimic the population at large. However, we would argue that many of these synthetic populations and therefore the models using them, miss the social connections which were paramount to the spread of the pandemic. Our argument being is that contagious diseases mainly spread through people interacting with each other and therefore the social connections need to be captured. To address this, we create a geographically-explicit synthetic population along with its social network for the Western New York (WNY) Area. This synthetic population is then used to build a framework to explore a hypothetical contagious disease inspired by various of COVID-19. We show simulation results from two scenarios utilizing this framework, which demonstrates the utility of our approach capturing the disease dynamics. As such we show how basic patterns of life along with interactions driven by social networks can lead to the emergence of disease outbreaks and pave the way for researchers to explore the next pandemic utilizing agent-based modeling with geographically explicit social networks.
Keywords: Agent-based Modeling, Synthetic Populations, Social Networks, COVID-19, Disease Modeling.
![]() |
Single Lineage Results: (a) Overall SEIR Dynamic; (b) Contact Tracing Example. |
![]() |
Western New York Commuting Pattern. |
Reference:
Jiang N., Crooks, A.T. (2024), Studying Contagious Disease Spread Utilizing Synthetic Populations Inspired by COVID-19: An Agent-based Modeling Framework, Proceedings of the 7th ACM SIGSPATIAL International Workshop on Geospatial Simulation (GeoSim 2024), Atlanta, GA., pp. 29-32. (pdf)
The first Geomob Netherlands took place in Utrecht on the evening of Thursday, October 31st, 2024 at the NOVI University of Applied Sciences: Newton House, 4th floor, Newtonlaan 247, 2584 BH Utrecht OpenStreetMap, Google Maps
Doors open at 16.30, set up and general mingling
Talks begin at 17:00 with a very brief introduction
Each speaker will have slides and speak for 10-15 minutes. After each talk there will be time for 2-3 questions.
We vote by for the best speaker. The winner will receive the best speaker prize and unending glory (see the full list of all past winners).
Discussion and #award and #geobeers paid for by the sponsors: NOVI Hogeschool.
Ed Freyfogle, Looking back at 15+ years of Geomob
Leonardo Mauri, T(w)o tree(s) or not to tree: the 3+30+300 rule
Petra Schoon, Spatial Autocorrelation
Edward Betts, Tools for linking Wikidata and OpenStreetMap
Would you like to speak at a future event? Speaker volunteers are always welcome
Geomob Netherlands (GeomobNL) is organized by Dirk Voets, Leonardo Mauri, and Melissa Kwakernaak.
Geomob would not be possible without speakers and sponsors. Over the years we have had so many fantastic talks, spanning the range from inspirational to informative to weird and wacky. See the list of all the past speakers. Please get in touch if you would like to speak at a future Geomob.
Please share the event details with everyone you know who may find the evening interesting.
If you can’t attend (or even if you can) be sure to sign up fory the monthly Geomob mailing list, where we announce upcoming events.
Just a quick post, In recently released Encyclopedia of Human Geography edited by Barney Warf we were asked to write a short chapter entitled "Agent-based Models and Geography" In the chapter we discuss how over the last several decades, agent-based modeling has gained widespread adoption in geography.and introduce the reader to what are agent-based models, how they have developed and types of geographical applications that can be explored with them, especially when linked to Geographical Information Systems (GIS). The chapter concludes with a brief summary along with a discussion of challenges and opportunities with agent-based modeling (ABM). If this sounds of interest, below you can find the full reference and link to the chapter.
![]() |
Example application domains for agent-based models over various spatial and temporal scales. For more examples and further details can be found at https://www.gisagents.org/ |
Full Referece:
Crooks, A.T. and Jiang, N. (2024), Agent-based Models and Geography, in Warf, B. (ed.), The Encyclopedia of Human Geography, Springer, Cham, Switzerland, https://doi.org/10.1007/978-3-031-25900-5_258-1. (pdf)
During the COVID-19 pandemic, social media become an important hub for public discussions on vaccination. However, it is unclear how the rise of cyber space (i.e., social media) combined with traditional relational spaces (i.e., social circles), and physical space (i.e., spatial proximity) together affect the diffusion of vaccination opinions and produce different impacts on urban and rural population's vaccination uptake. This research builds an agent-based model utilizing the Mesa framework to simulate individuals' opinion dynamics towards COVID-19 vaccines, their vaccination uptake and the emergent vaccination rates at a macro level for New York State (NYS). By using a spatially explicit synthetic population, our model can accurately simulate the vaccination rates for NYS (mean absolute error=6.93) and for the majority of counties within it (81\%). This research contributes to the modeling literature by simulating individuals vaccination behaviors which are important for disease spread and transmission studies. Our study extends geo-simulations into hybrid-space settings (i.e., physical, relational, and cyber spaces).Keywords: Agent-based modeling, GIS, Information diffusion, Hybrid spaces, Social networks, Health informatics, Vaccines, COVID-19.
![]() |
Modeling process and structure: from data to agent-behaviors. |
![]() |
Mapping the differences (i.e., mean absolute error (MAE)) in vaccination rate between simulated and ground truth data. |
Yin, F., Jiang, Na., Crooks, A.T., Laurian, L. (2024), Agent-based Modeling of Covid-19 Vaccine Uptake in New York State: Information Diffusion in Hybrid Spaces, Proceedings of the 7th ACM SIGSPATIAL International Workshop on Geospatial Simulation (GeoSim 2024), Atlanta, GA., pp. 11-20. (pdf)
How to contribute to GRASS GIS development: Guidance for new developers in the GRASS GIS Project.
The post How to contribute to GRASS GIS development appeared first on Markus Neteler Consulting.
On October 2-3, 2024, New Light Technologies (NLT) participated for the fourth time in the Innovation Summit for Preparedness & Resilience (InSPIRE), organized by the NAPSG Foundation. Held at Indiana University in Indianapolis, the event brought together leaders in geospatial technology, public safety, and emergency management to explore cutting-edge geospatial solutions for disaster management. NLT proudly served as a Platinum sponsor, underscoring our commitment to enhancing the safety, resilience, and well-being of communities. NLT has long been dedicated to fostering partnerships between public and private sectors and academia in the field of disaster management, advancing public safety technologies that make communities more resilient.
Abstract:
In the United States, educational attainment and student retention in higher education are two of the main focuses of higher education research. Institutions are constantly looking for ways to identify areas of improvement across different aspects of the student experience on university campuses. This paper combines Department of Education data over a 10 year period, U.S. Census data, and higher education theory on student retention, to build an agent-based model of student behavior. Furthermore we model student social interactions with their peers along with considering environmental components (e.g., urban vs. rural campuses) and institution personnel to explore the elements that increase the likelihood of student retention. Results suggest that both social interactions and environmental components make a difference in student retention. Suggesting that higher education institutions should consider new ways to accommodate learning needs that promote better student outcomes.
Keywords: Agent-Based Model, College Campuses, Higher Education, Department of Education, Social Interactions, Student Retention.
![]() |
Student retention 2007-2021 by institutional support and urbanicity for Urban, Suburban, Town, and Rural areas. |
![]() |
Model Graphical User Interface. |
![]() |
Process Overview and Model Logic. |
![]() |
Retention results for Urban (left) and Rural (right) settings of low support and low sense of belonging while high motivation (grit). |
Referece:
Stine, A.A. and Crooks, A.T. (2024), Retention in Higher Education: An Agent-Based Model of Social Interactions and Motivated Agent Behavior, Proceedings of the 2024 International Conference of the Computational Social Science Society of the Americas, Santa Fe, NM. (pdf)
As part of our showcase of the seed grant awardees for the Field Boundaries for Agriculture initiative, Taylor Geospatial Engine is pleased to highlight Jed Sundwall and Radiant Earth.
The post Innovation Bridge Community Spotlight: Jed Sundwall and Radiant Earth appeared first on Taylor Geospatial Engine.
We are excited to announce the founding CNG Editorial Board, a group of leaders in our community who have graciously volunteered to guide our work. The experience and good judgment of our board helps us identify new technologies on the horizon and what fads can we safely ignore as we create our events and content.
The editorial board is also designed to provide opportunities for visibility and leadership to our community members. Half of the CNG editorial board will be replaced every 12 months with new members selected by the existing editorial board. This will allow us to gain expertise from more people throughout our community and support emerging leaders.
We are immensely grateful to our board for their support as we build CNG together.
Geomob Barcelona took place at 6:00 PM on Wednesday the 16th of October, 2024 at CoWorkIdea, at Carrer de Torres i Amat, 21, First Floor.
The goal of Geomob is to provide a forum to learn and exchange ideas about any interesting services and projects that deal with location. Everyone working in or curious about the location space or with location services is welcome. You absolutely do not need to be some sort of GIS expert (though GIS experts are of course welcome as well).
doors open at 18:00, set up and general mingling
at 18:30 we begin the talks with a very brief introduction
Each speaker will have slides and speak for 10-15 minutes. After each talk there will be time for 2-3 questions. The talks will be in English.
We vote - using FeatureUpvote - for the best speaker. The winner will receive a SplashMap and unending glory (see the full list of all past winners).
We head to a nearby bar for discussion and #geobeers paid for by the sponsors.
Dan Hirst, Geospatial for Carbon Offsets
Joel Grau Bellet, Institut Cartogràfic i Geològic de Catalunya
Alexander Semenov, Geosemantica: introduction to our geospatial use cases
Want to speak at a future event? Please volunteer.
GeomobBCN is organized by Ed Freyfogle
Geomob would not be possible without speakers and sponsors. Over the years we have had so many fantastic talks, spanning the range from inspirational to informative to weird and wacky. See the list of all the past speakers. Please get in touch if you would like to speak at a future Geomob.
Please share the event details with everyone you know who may find the evening interesting.
If you can’t attend (or even if you can) be sure to sign up fory the monthly Geomob mailing list, where we announce upcoming events.
The post Migration of grass-dev mailing list to OSGeo Discourse appeared first on Markus Neteler Consulting.
At TGE, our guiding principle is to contribute purposefully by elevating research-grade innovation into user-friendly and accessible capabilities that have broad awareness and reach. Because we are a very small team, this means that we are heads down most of the time.
It’s important this week to take a minute to pop up and do a little celebrating! Two very different efforts, that are both critically important to us, have reached milestones.
The post Celebrating Our Community’s Success appeared first on Taylor Geospatial Engine.
The post TGI Connect Newsletter: September 2024 appeared first on Taylor Geospatial Institute.
The post Geo-Resolution 2024 Showcased a Thriving Geospatial Ecosystem appeared first on Taylor Geospatial Institute.
The post TGI Spotlight – Shaowen Wang, TGI Research Associate and Research Council member appeared first on Taylor Geospatial Institute.
In case you haven’t seen the news – we had a lot of rain Last Friday. A hurricane hit Florida and then proceeded to drench a large portion of the world I Inhabit I was fine. Chattanooga was fine. North East Tennessee and Western North Carolina aren’t. The more I read up that also included […]
The post Paper Maps or Something close appeared first on North River Geographic Systems Inc.
Back to entry 1
I was glancing at the New York Times and saw that Catherine, the Princess of Wales, had released an update on her treatment. And I thought, “wow, I hope she’s doing well”. And then I thought, “wow, I bet she gets a lot of positive affirmation and support from all kinds of people”.
I mean, she’s a princess.
Even us non-princesses, we need support too, and I have to say that I have been blown away by how kind the people around me in my life have been. And also how kind the other folks who I have never really talked with before have been.
I try to thank my wife as often as I can. It is hard not to feel like a burden when I am, objectively, a burden, no matter how much she avers I am not. I am still not fully well (for reasons), and I really want to be the person she married, a helpful full partner. It is frustrating to still be taking more than I’m giving.
From writing about my experience here, I have heard from other cancer survivors, and other folks who have travelled the particular path of colorectal cancer treatment. Some of them I knew from meetings and events, some from their own footprint on the internet, some of them were new to me. But they were all kind and supportive and it really helped, in the dark and down times.
From my work on the University of Victoria Board of Governors, I have come to know a lot of people in the community there, and they were so kind to me when I shared my diagnosis. My fellow board members stepped in and took on the tasks I have not been able to do the past few months, and the members of the executive and their teams were so generous in sending their well-wishes.
And finally, my employers at Crunchy Data were the best. Like above and beyond. When I told them the news they just said “take as much time as you need and get better”. And they held to that. My family doctor asked “do you need me to write you a letter for your employer” and I said “no, they’re good”, and he said, “wow! don’t see that very often”. You don’t. I’m so glad Crunchy Data is still small enough that it can be run ethically by ethical people. Not having to worry about employment on top of all the other worries that a cancer diagnosis brings, that was a huge gift, and not one I will soon forget.
I think people (and Canadians to a fault, but probably people in general) worry about imposing, that communicating their good thoughts and prayers could be just another thing for the cancer patient to deal with, and my personal experience was: no, it wasn’t. Saying “thanks, I appreciate it” takes almost no energy, and the boost of hearing from someone is real. I think as long as the patient doesn’t sweat it, as long as they recognize that “ackknowledged! thanks!” is a sufficient response, it’s all great.
Fortunately, I am not a princess, so the volume was not insuperable. Anyways, thank you to everyone who reached out over the past 6 months, and also to all those who just read and nodded, and maybe shared with a friend, maybe got someone to take a trip to the gastroenterologist for a colonoscopy.
Talk to you all again soon, inshala.
The post GRASS GIS PSC Elections 2024: nomination period ongoing appeared first on Markus Neteler Consulting.
![]() |
Lineage distribution of SARS-CoV-2 across geographic regions of Ontario, Canada, Western New York, and New York City over time |
Abstract:
The COVID-19 pandemic has prompted an unprecedented global effort to understand and mitigate the spread of the SARS-CoV-2 virus. In this study, we present a comprehensive analysis of COVID-19 in Western New York (WNY), integrating individual patient-level genomic sequencing data with a spatially informed agent-based disease Susceptible-Exposed-Infectious-Recovered (SEIR) computational model. The integration of genomic and spatial data enables a multi-faceted exploration of the factors influencing the transmission patterns of COVID-19, including genetic variations in the viral genomes, population density, and movement dynamics in New York State (NYS). Our genomic analyses provide insights into the genetic heterogeneity of SARS-CoV-2 within a single lineage, at region-specific resolutions, while our population analyses provide models for SARS-CoV-2 lineage transmission. Together, our findings shed light on localized dynamics of the pandemic, revealing potential cross-county transmission networks. This interdisciplinary approach, bridging genomics and spatial modeling, contributes to a more comprehensive understanding of COVID-19 dynamics. The results of this study have implications for future public health strategies, including guiding targeted interventions and resource allocations to control the spread of similar viruses.
![]() |
Commuter behavior dynamics in WNY. Estimated commuter populations originating in a specific county. (A) Commuter behavior with Erie County origins. (B) Commuter behavior from Niagara County origin. (C) Commuter behavior from Monroe County origin. (D) Composite Commuter behavior network. |
Full Reference:
Bard, J.E., Jiang, N., Emerson, J., Bartz, M., Lamb, N.A., Marzullo, B.J., Pohlman, A., Boccolucci, A., Nowak, N.J., Yergeau, D.A., Crooks, A.T. and Surtees, J. (2024), Genomic Profiling and Spatial SEIR Modeling of COVID-19 Transmission in Western New York, Frontiers in Microbiology, 15. Available at https://doi.org/10.3389/fmicb.2024.1416580 (pdf)
Tell Us About Yourself I grew up in Kuala Lumpur, Malaysia, and moved to Austin, Texas, where I was an undergraduate student and then a geologist. I’m currently graduating in October from an international cartography master’s program based in Europe, and I am excited to see what life brings next. Tell us the story behind […]
The post Maps and Mappers of the 2024 Calendar – September – Phoebe Ly appeared first on GeoHipster.
The PostGIS Team is pleased to release PostGIS 3.5.0! Best Served with PostgreSQL 17 RC1 and GEOS 3.13.0.
This version requires PostgreSQL 12 - 17, GEOS 3.8 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. SFCGAL 1.4+ is needed to enable postgis_sfcgal support. To take advantage of all SFCGAL features, SFCGAL 1.5 is needed.
Cheat Sheets:
This release is a feature release that includes bug fixes since PostGIS 3.4.3, new features, and a few breaking changes.
As the digital transformation of industries accelerates, the management of Digital Twins and Building Information Models (BIM) has emerged as a critical challenge. These models are fundamental in sectors such as construction, urban planning, and facility management, providing detailed representations of physical assets and enabling advanced simulations and analyses. However, the complexity of these models, coupled with the involvement of multiple stakeholders, presents significant challenges in maintaining data integrity, managing derivative works, and ensuring the enforcement of contracts.
Blockchain technology offers a powerful solution to these challenges by providing a secure, decentralized, and transparent system for managing Digital Twins and BIM models. Here’s how blockchain can revolutionize this space: Data Integrity and Security: - Example: Consider a large-scale infrastructure project involving a Digital Twin of a smart city. The model needs to be frequently updated with data from various sensors, construction teams, and urban planners. With blockchain, every update to the Digital Twin is recorded on an immutable ledger, ensuring that the data remains consistent and secure over time. This prevents unauthorized changes, ensuring that all stakeholders can trust the data’s accuracy and reliability. - Benefit: Blockchain’s decentralized ledger makes it nearly impossible for any single entity to alter the data without detection, significantly reducing the risk of data tampering and ensuring that the Digital Twin remains a trustworthy source of information throughout the project lifecycle. Provenance and Traceability: - Example: In the development of a new commercial building, the BIM model undergoes multiple iterations, with contributions from architects, engineers, and contractors. Blockchain enables each modification to the model to be logged with a timestamp and the identity of the contributor. If a design flaw is later discovered, stakeholders can trace back through the blockchain to identify who made each change, when it was made, and the rationale behind it. - Benefit: This level of traceability ensures accountability and transparency, allowing project managers to easily audit the development process and maintain a clear history of the model’s evolution. It also simplifies the process of compliance with regulatory requirements by providing an unalterable record of the model’s development. Smart Contracts and Licensing: - Example: Imagine a scenario where a BIM model is used by multiple subcontractors for different aspects of a construction project. Each contractor needs access to specific parts of the model under certain licensing agreements. With blockchain, smart contracts can automatically enforce these agreements—granting access only to authorized users and ensuring that usage complies with the predefined terms. For instance, a smart contract could be set to automatically release payment when a subcontractor completes a specific task using the BIM model, and this action would be recorded on the blockchain. - Benefit: Smart contracts streamline the enforcement of licensing and usage agreements, reducing the administrative burden on project managers and ensuring compliance without the need for manual oversight. This automation also reduces the risk of legal disputes by ensuring that all parties adhere to the agreed-upon terms. Collaboration and Version Control: - Example: In a complex project like the construction of a new transportation hub, multiple teams—ranging from civil engineers to environmental consultants—need to collaborate on the BIM model. Blockchain facilitates this by providing a single, shared ledger where every change to the model is recorded and visible to all authorized parties. If an engineer updates the model with new structural data, this update is immediately available to the entire team, and the blockchain ensures that the change is properly recorded and cannot be overwritten without consensus. - Benefit: This approach eliminates version conflicts and ensures that all stakeholders are working from the most up-to-date data. It fosters a more collaborative environment by enabling secure, transparent sharing of information across teams, reducing the likelihood of errors and rework. Conclusion: The integration of blockchain technology into the management of Digital Twins and BIM models provides a robust solution to some of the most pressing challenges in the construction and infrastructure sectors. By ensuring data integrity, enhancing traceability, automating contract enforcement, and improving collaboration, blockchain not only addresses current challenges but also paves the way for new opportunities in project management and digital asset management. As industries continue to adopt digital technologies, blockchain’s role in managing the complex lifecycle of Digital Twins and BIM models will become increasingly vital.
I’ve been too busy to write anything as of late. I have a lot to talk about – just not much time to do it. So FOSS4GNA happened on Sept 9-11 2024. Probably the biggest thing for me during the conference was we had two BOFs on QGIS. Granted – I think it was supposed […]
The post QGIS US User Group appeared first on North River Geographic Systems Inc.
Back to entry 1
What happened there, I didn’t write for three months! Two words: “complications”, and “recovery”.
In a terrifying medical specialty like cancer treatment, one of the painful ironies is that patients spend a lot of time suffering from complications and side effects of the treatments, rather than the cancer. In my case and many others, the existence of the cancer isn’t even noticable without fancy diagnostic machines. The treatments on the other hand… those are very noticable!
A lot of this comes with the territory of major surgery and dangerous chemicals. My surgery included specific possible complications including, but not limited to: incontinence, sexual disfunction, urinary disfunction, and sepsis.
Fortunately, I avoided all the complications specific to my surgery.
What I did not avoid was a surprisingly common complication of spending some time in a hospital while taking broad spectrum antibiotics–I contracted the “superbug” clostridioides difficile, aka c.diff.
Let me tell you, finding you have a “superbug” is a real bummer, and c.diff lives up to its reputation. Like cancer, it is hard to kill, it does quite a bit of damage while it’s in you, and the things that kill it also do a lot of damage to your body.
Killing my c.diff required a couple of courses of specialized antibiotics (vancomycin), that in addition to killing the c.diff also killed all the other beneficial bacteria in my lower intestine.
So, two months after surgery, I was recovering from:
Not surprisingly, having all those things at once makes for a much longer recovery, and a pretty up-and-down one. My slowly recovering microbiota is in constant flux, which results in some really surprising symptoms.
I had not really understood the implications of gut/brain connection, until this journey showed me just how tightly bound my mental state was to the current condition of my guts. The anxiety I have experienced as a result of my c.diff exposure has been worse, amazingly, than what I felt after my initial cancer diagnosis. One was in my head, but the other was in my gut.
I have also developed a much more acute sympathy for people suffering from long Covid and other chronic diseases. The actual symptoms are bad enough, but the psychological effect of the symptom variability is really hard to deal with. Bad days follow good days, with no warning. I have mostly stopped voicing any optimism about my condition, because who knows what tomorrow will bring.
When people ask me how I’m doing, I shrug.
One thing I have got going for me, that chronic disease sufferers do not, is a sense that I am in fact improving. I started journaling my symptoms early in the recovery process, and I can look back and see definitively that while things are unpredictable day to day, or even week to week, the long term trajectory is one of improvement.
Without that, I think I’d go loopy.
Anyways, I am now rougly three months out from my last course of antibiotics, and I expect it will be at least another three months before I’m firing on all cylinders again, thanks mostly to the surgical complication of acquiring c.diff. If I was just recovering from the surgery, I imagine I would be much closer to full recovery.
Last week I had the honor to give a keynote talk entitled "Exploring the World from the Bottom Up with GIS and Agent-based Models: Past, Present and Future" at the 19th annual Social Simulation Conference which is the European Social Simulation Association (ESSA) annual conference. Attending the conference was a great experience being exposed to various applications of social simulation, catching up with old friends and meeting many new people. For anyone interested below I have pasted the abstract from my talk and the slides from the talk can be found here.
Abstract
We have seen explosion in the availability of data along with utilizing such data in agent-based models. At the same time, we have seen a huge growth in computational power and the associating agent-based models to real world locations through the use of geographical information systems (GIS). This talk will explore how geographically explicit agent-based models have grown and evolved over the last 20 years taking advantage of the explosion of data and computational power. It will showcase a selection of applications of agent-based models and how they can be used to explore the world from the bottom up and with a specific emphasis on cities and regions. Through examples, I will demonstrate how GIS can be used to build agent-based models ranging from using spatial data to create the artificial worlds that the agents inhabit to utilizing demographic data to build synthetic populations. However, it is not just data that is important when building agent-based models but also how do we incorporate human behavior and theory into such models along with considerations of connecting agents through various types of social and spatial networks. While this might appear simple, there are many challenges associated with this which will be discussed using representative examples ranging from basic patterns of life to vaccination uptake. The talk will conclude with what opportunities are emerging in light of the recent growth in artificial intelligence (AI) with respect to building agent-based models.
Keywords: Agent-based modeling, AI, GIS, Social Networks, Cities.
![]() |
Types of Problems Agent-Based Models have Explored |
![]() |
Growth of Geographical Agent-based models. |
Crooks, A.T. (2024), Exploring the World from the Bottom Up with GIS and Agent-based Models: Past, Present and Future. The 19th Annual Social Simulation Conference, 16th –20th September, Cracow, Poland.
As part of our showcase of the seed grant awardees for the Field Boundaries for Agriculture initiative, Taylor Geospatial Engine is pleased to highlight Dr Nathan Jabobs. Dr Jacobs is Director of the Multimodal Vision Research Laboratory (MVRL) and a Professor of Computer Science and Engineering at the McKelvey School of Engineering at Washington University in St. Louis, MO. His research centers on developing learning-based algorithms and systems for extracting information from large-scale image collections.
The post Innovation Bridge Community Spotlight: Dr. Nathan Jacobs appeared first on Taylor Geospatial Engine.
We’re excited to announce the Cloud-Native Geospatial Forum (CNG) membership program. We have changed our name, but not our commitment to making geospatial data easier to access and use.
As geospatial data becomes more important, so does the need for a vendor-agnostic, trusted platform to help people understand the true benefits and limitations of geospatial technology. CNG is stepping up to meet this need, providing a neutral forum where geospatial data users can come together and exchange ideas, share experiences, and learn from one another. Our membership program is designed to unite and empower a diverse community of geospatial professionals from across industries and specialties.
In the last few years, we’ve witnessed how the cloud ecosystem, fueled by open standards, has changed how we work with geospatial data online. Cloud-native technologies have created capabilities that have been rapidly adopted across the commercial and public sectors. Despite this, many geospatial communities remain siloed by industry or domain, limiting knowledge sharing across the community. CNG bridges this gap by creating opportunities for government agencies, academic institutions, nonprofits, international organizations, commercial enterprises, and startups to connect and collaborate.
The membership program was created in this spirit.
CNG membership supports a community that is at the forefront of geospatial innovation. Members benefit from:
Over time, we hope to provide more benefits to CNG members.
We believe knowledge and collaboration should be accessible to everyone, and we believe the best way to make this possible is with support from our community. Your membership fees allow us to convene our community and create and share as much free content as possible.
Membership tiers and pricing are designed to be accessible to all with options for commercial organizations, academic institutions, professionals, and students. We also waive membership fees for individuals who can’t afford them. For more information on membership tiers and benefits, visit cloudnativegeo.org/join.
Join us today and help create the future of geospatial data.
More than just a data platform, Prescient is a gateway to smarter, safer, and more efficient operations. Designed to help midstream companies optimize Lidar data, Prescient makes data management easier and boosts risk mitigation.
The post Enhancing Risk Management in the Midstream Oil and Gas Industry with LiDAR and Prescient appeared first on Sparkgeo.
In May last year, we announced the “Cloud-Native Geospatial Foundation” as an initiative to “help people adopt patterns and best practices for efficiently sharing Earth science data on the Internet using a cloud-native approach.”
Since then, we’ve done quite a bit.
We’ve published 29 blog posts and quickly attracted over 1,000 followers on X. We created a new Slack workspace which has over 400 members and 200 monthly active users. Combined with some of Radiant Earth’s previously created online channels, we now have a social media following of over 6,000 across X, LinkedIn, and Medium, and our quarterly newsletter has over 8,000 subscribers.
We have hosted in-person sprints for Zarr, STAC, and GeoParquet. We’ve been a part of the first SatCamp, the ESIP 2023 Summer and Winter Meeting Cloud Computing sessions, and convened a two-day workshop in Rwanda focused on improving access to air quality data throughout Africa. We’ve hosted a series of webinars to introduce people to cloud-native concepts, including a series made specifically for the Kenyan Space Agency. More virtual events are scheduled for this October and November.
Our team is rightfully proud of what we’ve accomplished, and we’ve learned a lot about our community, what they need from us, and what we have to offer. Today we are renaming the Cloud-Native Geospatial Foundation to the Cloud-Native Geospatial Forum (CNG). Here’s why:
There’s a clear need for vendor-agnostic sources of expertise and guidance on innovations in geospatial technology. Massive cloud providers, new startups, and the open-source community are all relentlessly creating new technologies and capabilities that benefit geospatial data users. We are uniquely positioned to provide a neutral forum in which geospatial data users can teach each other how to benefit from these advances.
In far too many instances, our community is segregated by industry or domain. CNG deliberately convenes community members from a diverse set of organizations including government agencies, academic institutions, international organizations, nonprofits, commercial enterprises, and startups. While we all work with geospatial data, we benefit from learning from peers who have different perspectives, priorities, constraints, and capabilities. This subtle name change will make it explicit that our primary role is as a convener and enabler of conversations for our community members.
Here’s how we’re going to do it:
We will continue to create as much free content as possible and put it on the open web. We will maintain the CNG blog and continue to produce webinars.
We are launching a paid membership program to directly raise funds from our community. This initiative will enable us to create free content that can engage a broader group of geospatial data users. Membership is designed to be affordable, accessible, and truly beneficial to our members. Membership tiers, their costs, and benefits are available at cloudnativegeo.org/join.
We have started an open Discourse forum to ensure that community discussions are available on the open Internet.
We are organizing our own conferences. A virtual conference will be held on November 13 this year and an in-person conference is planned for May 2025 somewhere in the US. We believe there’s a gap in the market for events of this type. Our goal is to make these conferences profitable, enabling us to offer subsidies or scholarships to welcome new members into our community. Look for opportunities to sponsor and exhibit at these events.
We are assembling an editorial board. We have learned there’s no easy definition of “cloud-native.” Instead of coming up with hardened definitions and rubrics, we will convene a board of leaders in our community to continually discuss ways we should pursue our mission of continually making data easier to access and use. Editorial board members will provide feedback on the agendas for our events and provide reviews of blog posts published on the CNG Blog.
(As an aside, the difficulty of defining “cloud-native” is part of the reason why we’ll usually refer to ourselves as CNG. The other reason is because Cloud-Native Geospatial Forum is simply too much of a mouthful.)
As before, Radiant Earth can provide fiscal sponsorship for initiatives important to our community, as we did with STAC. We will continue to collaborate with the Open Geospatial Consortium and any other standards bodies to identify emerging patterns and best practices that are candidates for standardization.
There’s much more we want to do (such as creating a job board, running regular surveys of our community, and formalizing our process of organizing development sprints) but the activities listed here are where we’re going to start.
If you’ve made it this far, please consider joining the CNG forum, either as an individual or for your organization. We can use your help to grow our community and unlock the potential of geospatial data.
The PostGIS Team is pleased to release PostGIS 3.5.0beta1! Best Served with PostgreSQL 17 RC1 and GEOS 3.13.0.
This version requires PostgreSQL 12 - 17, GEOS 3.8 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. SFCGAL 1.4+ is needed to enable postgis_sfcgal support. To take advantage of all SFCGAL features, SFCGAL 1.5 is needed.
This release is a beta of a major release, it includes bug fixes since PostGIS 3.4.3 and new features.
The PostGIS Team is pleased to release PostGIS 3.5.0rc1! Best Served with PostgreSQL 17 RC1 and GEOS 3.13.0.
This version requires PostgreSQL 12 - 17, GEOS 3.8 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. SFCGAL 1.4+ is needed to enable postgis_sfcgal support. To take advantage of all SFCGAL features, SFCGAL 1.5 is needed.
This release is a release candidate of a major release, it includes bug fixes since PostGIS 3.4.3 and new features.
Changes since 3.5.0beta1 are as follows:
The Taylor Geospatial Institute (TGI), in collaboration with the Center for Strategic and International Studies (CSIS), Taylor Geospatial Engine, and the United States Geospatial Intelligence Foundation (USGIF), announces a sneak peek of, “The 2024 Commercial Remote Sensing Global Rankings,” an assessment of the world’s leading commercial space-based remote sensing systems.
The post The 2024 Commercial Remote Sensing Global Rankings appeared first on Taylor Geospatial Institute.
We are proud to announce the final release of STAC 1.1.0.
The focus has been the addition of a common band construct to unify the fields eo:bands
and raster:bands
. Additionally, Item Asset Definition (field item_assets
) - formerly a popular STAC extension - is now part of the core specification. Various additional fields have been made available via the common metadata mechanism, e.g. keywords
, roles
, data_type
and unit
. We collaborated closely with the editors of OGC API - Records to align better with STAC, which resulted, for example, in a change to the license
field. The link object was extended to support additional HTTP mechanisms such as HTTP methods other than GET
and HTTP headers. The best practices have evolved and various minor changes and clarifications were integrated throughout the specification.
A shoutout to all the participants and sponsors of the last STAC sprint in Philadelphia, who laid a solid basis for this release. Emmanuel Mathot and I were then funded by the STAC PSC to finalize the work. Thank you to everyone who made this possible.
The changes since v1.1.0-beta.1 are minor. We added media types to the best practices, clarified that item_assets
in Collections are not required and better describe the Statistics Object in common metadata.
Please read the release notes for all changes that have been made to the specification since v1.0.0. In the following sections, we’ll highlight the most important changes in the specification with some JSON snippets.
Note: The following information is the very similar to the information provided in the v1.1.0-beta.1 blog post.
As of STAC 1.1, the bands
array merges the similar but separate fields eo:bands
and raster:bands
, which was probably one of the most annoying things in STAC for historical reasons. The new bands
field can be used in combination with property inheritance to provide users with more flexibility.
It should be relatively simple to migrate from STAC 1.0 (i.e. from eo:bands
and/or raster:bands
) to the new bands
array. Usually, you can merge each object on a by-index basis. For some fields, you need to add the extension prefix of the eo
or raster
extension to the property name. Nevertheless, you should consider deduplicating properties with the same values across all bands to the Asset. Please also consider the Bands best practices when migrating from eo:bands
and raster:bands
. It also provides more specific examples.
STAC 1.0 example:
{
"assets": {
"example": {
"href": "example.tif",
"eo:bands": [
{
"name": "r",
"common_name": "red"
},
{
"name": "g",
"common_name": "green"
},
{
"name": "b",
"common_name": "blue"
},
{
"name": "nir",
"common_name": "nir"
}
],
"raster:bands": [
{
"data_type": "uint16",
"spatial_resolution": 10,
"sampling": "area"
},
{
"data_type": "uint16",
"spatial_resolution": 10,
"sampling": "area"
},
{
"data_type": "uint16",
"spatial_resolution": 10,
"sampling": "area"
},
{
"data_type": "uint16",
"spatial_resolution": 30,
"sampling": "area"
}
]
}
}
}
After migrating to STAC 1.1 this is ideally provided as follows:
{
"assets": {
"example": {
"href": "example.tif",
"data_type": "uint16",
"raster:sampling": "area",
"raster:spatial_resolution": 10,
"bands": [
{
"name": "r",
"eo:common_name": "red",
},
{
"name": "g",
"eo:common_name": "green"
},
{
"name": "b",
"eo:common_name": "blue"
},
{
"name": "nir",
"eo:common_name": "nir",
"raster:spatial_resolution": 30
}
]
}
}
}
Apart from a much shorter and more readable list of bands, you’ll notice the following:
bands
.common_name
and spatial_resolution
were renamed to include the extension prefixes.data_type
and raster:sampling
(renamed from sampling
) were deduplicated to the Asset as the values were the same across all bands.spatial_resolution
was also deduplicated, i.e. 10
is provided on the asset level, which is inherited by the bands unless explicitly overridden. Therefore, the nir
band overrides the value 10
with a value of 30
.As a result, the new bands
array is more lightweight and easier to handle.
To make all this possible there were corresponding changes and releases for the following two extensions:
To better align with OGC API - Records, we slightly changed the license
field. The license field additionally supports SPDX expressions and the value other
. At the same time, the values proprietary
and various
were deprecated in favor of SPDX expressions and other
. The new value other
also solves an issue many data providers reported with the term proprietary
, which was misleading for open licenses that were just not listed in the SPDX database.
The list of fields in the STAC common metadata model was extended, which partially was a result of the changes to the bands mentioned above.
The following fields were added:
bands
(see above)keywords
(as formerly defined in Collections)roles
(as formerly defined in Assets)data_type
, nodata
, statistics
and unit
(as formerly defined in the Raster extension)Please also note that the specification was restructured a bit so that common elements such as Assets and Links are not defined in each specification (Catalog, Collection, Item) anymore, but instead, they are separately defined once in the commons folder.
The Link Object used for the links
field, has been extended to support more HTTP mechanisms. The additions were already specified in STAC API 1.0.0 but were forgotten to be added to STAC 1.0.0, so we are catching up now.
The following additional fields are available for links:
method
: The HTTP method (e.g. POST
) to use for the links (defaults to GET
)headers
: The HTTP headers to send alongside the link requestbody
: The request body to send for the link request (usually only applies to POST
, PUT
and PATCH
methods)In addition to the extended Link Object, various smaller changes were made to link-related subjects:
self
are now validated and as such are required to be absolute.For links with relation type parent
and root
a clarification was issued: Conceptually, STAC entities shall have no more than one parent entity. As such, STAC entities also can have no more than one root entity. So there’s usually just one link with root
or parent
relationship unless different variations of the same conceptual entity exist (identified by the ID). Different variations could be:
type
property), e.g. a HTML version in addition to JSONhreflang
property). e.g. a German version in addition to EnglishSimilarly, it was clarified that multiple collections can point to an Item, but an Item can only point back to a single collection.
The item_assets
field that was previously an extension is now part of the STAC specification. It was probably the most commonly used extension and many extensions were defining schemas for it, so it is simpler to have it in the core specification.
No changes are required in the migration, although you can remove the schema URI of the extension (i.e. https://stac-extensions.github.io/item-assets/v1.0.0/schema.json
) from the stac_extensions
property. It doesn’t hurt to keep the schema URI though and although the extension was deprecated, the extension can still be used in STAC 1.0.0. It just won’t get any updates in the future, because any changes will be directly integrated into STAC itself.
The following best practices were introduced or have changed:
thumbnail
, overview
and visual
Please consult the best practices document for details.
In addition to the changes in the core specification, we also updated some extensions. I already mentioned the deprecation of the Item Assets extension and the band-related changes in the EO and Raster extensions above.
An additional change in the raster extension is that the common band names have been extended, especially rededge
was split into multiple common names (rededge071
, rededge075
and rededge078
) and green05
was added. This allows a direct mapping between the STAC common names and the Awesome Spectral Indices project.
We also released a significant change to the Projection extension. The change allows to provide CRS codes for authorities other than EPSG, e.g. OGC or IAU. Previously only EPSG was supported via the proj:epsg
field, e.g. "proj:epsg": 3857
for Web Mercator. The new version replaces the field with proj:code
so that the authority is included in the code, e.g. "proj:code": "EPSG:3857"
.
Version 2.0.0 of the Projection extension removes and forbids the use of proj:epsg
. If you want to migrate more gracefully, you can also migrate to the intermediate version 1.2.0 which deprecates proj:epsg
and adds proj:code
at the same time.
description
is given, it can’t be emptystart_datetime
and end_datetime
are inclusive boundsJoin the Taylor Geospatial Engine team on September 12, 2024 in St. Louis, MO for the 2024 Geo-Resolution conference. This year’s conference focuses on the development and application of geospatial models, such as digital twins, that can be used to address some of the world’s most critical challenges.
The post Join Us at Geo-Resolution 2024 appeared first on Taylor Geospatial Engine.
The PostGIS Team is pleased to release PostGIS 3.4.7! This is a bug fix release.
Have you ever got a really good piece of life advice from your dad? Something along the lines of Good intentions matter, but your actions will define you.That’s basically what a design principle is, but instead of your dad, it’s a designer. And instead of advice about life, it’s about the thing you're building.
The post Sparkgeo’s Design Principles appeared first on Sparkgeo.
John is a geospatial consultant based in Fremantle, Western Australia, where he runs Mammoth Geospatial, an open-source-focused GIS company. Specialising in open source GIS consulting and training, his career has taken him from BC & the Yukon to South America, PNG, the Pacific, and Australia. Deeply involved in the open geo community, John started Geogeeks Perth, chaired […]
The post Local FOSS4Gs are a great way to bring the magic to the community appeared first on GeoHipster.
This post introduces STAC GeoParquet, a specification and library for storing and serving SpatioTemporal Asset Catalogs (STAC) metadata as GeoParquet. By building on GeoParquet, STAC GeoParquet makes it easy to store, transmit, and analyze large collections of STAC items. It makes for a nice complement to a STAC API.
STAC makes geospatial data queryable, especially “semi-structured” geospatial data like a collection of cloud-optimized GeotTIFFs (COGs) from a satellite. I can’t imagine trying to work with this type of data without a STAC API.
Concretely, STAC metadata consists of JSON documents describing the actual assets. STAC metadata can typically be accessed in two ways:
In practice, I haven’t encountered much data distributed as static STAC catalogs. It’s perhaps useful in some cases, but for large datasets or datasets that are constantly changing, a pile of JSON files becomes slow and impractical for both the data provider and consumer. A STAC API is almost a necessity to work with this type of data.
That said, running a STAC API is a hassle (speaking from experience here). You need some kind of database to store the STAC metadata and web servers to handle the API requests. That database and those web servers need to be deployed, monitored, and maintained.
Finally, with either a static STAC catalog or an API, large collections of STAC items require you to move around a lot of JSON. That’s slow for the web servers to serialize, slow to send over the network, and slow to deserialize on your end.
STAC GeoParquet offers a nice format for easily and efficiently storing, transferring, and querying large amounts of homogenous STAC items.
The basic idea is to represent a STAC collection as a GeoParquet dataset, where each column is a field from the STAC item (id
, datetime
, eo:cloud_cover
, etc.) and each row is an individual item.
The STAC GeoParquet specification describes how to convert between a set of STAC items and GeoParquet.
STAC GeoParquet optimizes for certain use cases by leveraging the strengths of the Apache Parquet file format, at the cost of some generality.
Parquet is a columnar file format, so all the records in a STAC GeoParquet dataset need to have the same schema. The more homogenous the items, the more efficiently you’ll be able to store them. In practice, this means that all the items in a collection should have the same properties available. This is considered a best practice in STAC anyway, but there may be some STAC collections that can’t be (efficiently) stored in stac-geoparquet. This is discussed in detail in Schema Considerations.
STAC GeoParquet (and Parquet more generally) is optimized for bulk and analytic use cases. Tabular data analysis libraries (like pandas, Dask, DuckDB, etc.) can read and efficiently query Parquet datasets. In particular, the Parquet file format’s support for statistics and partitioning can make certain access patterns extremely fast. A STAC GeoParquet dataset might be partitioned by time and space (using a quadkey, for example), letting you efficiently load subsets of the items. And Parquet is a columnar file format, so loading a subset of columns is fast and easy. These aspects of the file format pair nicely with cloud-native workflows, where HTTP range requests mean that you don’t even need to download data that your workflow would just filter out anyway.
You likely wouldn’t want to do “point” reads, where you look up an individual item by ID, from a STAC GeoParquet dataset. Databases like Postgres are much better suited for that type of workload.
And while STAC GeoParquet might not be a good on-disk format for a STAC API serving many small queries, it can still play an important role as a transmission format for queries returning large result sets. If a user makes a request that returns many items, it will be faster to transmit those results as STAC GeoParquet rather than JSON, thanks to Parquet’s more efficient compression and serialization options.
One neat feature is the ability to embed the Collection metadata in the Parquet file’s metadata. This gives you a great single-file format for moving around small to medium sized collections (large collections may need to be partitioned into multiple files, but can still be treated as a single dataset by Parquet readers).
In summary, JSON and Parquet are very different file formats that are appropriate for different use-cases. JSON is record oriented, while Parquet is column oriented. JSON is flexible with respect to types and schemas, while Parquet is strict (which can make building a STAC GeoParquet dataset from a collection of STAC items difficult). STAC GeoParquet inherits all these properties, which affects the use-cases it’s appropriate for.
As a simple example, we’ll look at what it takes to access one month’s worth of sentinel-2-l2a items from the Planetary Computer’s sentinel-2-l2a collection. January, 2020 had about 267,880 items.
With some clever code to parallelize requests to the STAC API, we can fetch those items in about 160 seconds.
>>> t0 = time.time()
>>> futures = [search(client, period) for period in periods]
>>> features_nested = await asyncio.gather(*futures)
>>> features = list(itertools.chain.from_iterable(features_nested))
>>> t1 = time.time()
>>> print(f"{t1 - t0:0.2f}")
162.16
The search
method is a couple dozen lines of moderately complex, async Python. Out of curiosity, I serialized that to disk as (uncompressed) ndjson, and it took up about 4.5 GB of space.
With the stac-geoparquet
Python library, we can convert the JSON items to GeoParquet:
>>> import stac_geoparquet
>>> rbr = stac_geoparquet.arrow.parse_stac_items_to_arrow(features)
>>> stac_geoparquet.arrow.to_parquet(rbr, "sentinel-2-l2a.parquet")
That takes only 260 MB on disk. It can be read with a simple:
>>> table = pyarrow.parquet.read_table("sentinel-2-l2a.parquet")
which finishes in just under 5 seconds. That’s not entirely a fair comparison to the 160 seconds from the API, since I’m loading that from disk rather than the network, but there’s ample room to spare.
The STAC GeoParquet Python library can also write to the Delta Table format.
Once in GeoParquet, various clients can query the dataset. If you’re a fan of SQL, DuckDB supports reading Parquet files:
$ duckdb
D select * from 'sentinel-2-l2a.parquet' where "eo:cloud_cover" < 10 limit 10;
Or using ibis we can get the average cloudiness per platform for each hour:
>>> counts = (
... ibis.read_parquet("sentinel-2-l2a.parquet")
... .group_by(
... _.platform,
... hour=_.datetime.truncate("h")
... ).aggregate(cloud_cover=_["eo:cloud_cover"].mean(), count=_.count())
... .order_by(["platform", "hour"])
... )
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━┓
┃ platform ┃ hour ┃ cloud_cover ┃ count ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━┩
│ string │ timestamp │ float64 │ int64 │
├─────────────┼─────────────────────┼─────────────┼───────┤
│ Sentinel-2A │ 2020-01-01 02:00:00 │ 63.445030 │ 192 │
│ Sentinel-2A │ 2020-01-01 04:00:00 │ 26.992815 │ 57 │
│ Sentinel-2A │ 2020-01-01 05:00:00 │ 72.727221 │ 95 │
│ Sentinel-2A │ 2020-01-01 06:00:00 │ 0.097324 │ 10 │
│ Sentinel-2A │ 2020-01-01 07:00:00 │ 54.984660 │ 131 │
│ Sentinel-2A │ 2020-01-01 10:00:00 │ 48.270195 │ 161 │
│ Sentinel-2A │ 2020-01-01 11:00:00 │ 97.241751 │ 27 │
│ Sentinel-2A │ 2020-01-01 12:00:00 │ 70.159764 │ 131 │
│ Sentinel-2A │ 2020-01-01 14:00:00 │ 47.591773 │ 388 │
│ Sentinel-2A │ 2020-01-01 15:00:00 │ 50.362548 │ 143 │
│ … │ … │ … │ … │
└─────────────┴─────────────────────┴─────────────┴───────┘
So, in all STAC GeoParquet offers a very convenient and high-performance way to distribute large STAC collections, provided the items in that collection are pretty homogenous (which they probably should be, for your users’ sake). It by no means replaces the need for a STAC API in all use cases. Databases like Postgres are really good at certain workloads. STAC GeoParquet complements a STAC API, by handling the bulk-access use-case that a typical JSON-based REST API struggles with. And if you just need to distribute a relatively static collection of STAC items, putting STAC GeoParquet on Blob Storage strikes a really nice balance between hardship for the producer and usefulness for the consumer.
Abstract:
Human mobility data science using trajectories or check-ins of individuals has many applications. Recently, we have seen a plethora of research efforts that tackle these applications. However, research progress in this field is limited by a lack of large and representative datasets. The largest and most commonly used dataset of individual human trajectories captures fewer than 200 individuals while data sets of individual human check-ins capture fewer than 100 check-ins per city per day. Thus, it is not clear if findings from the human mobility data science community would generalize to large populations. Since obtaining massive, representative, and individual-level human mobility data is hard to come by due to privacy considerations, the vision of this paper is to embrace the use of data generated by large-scale socially realistic microsimulations. Informed by both real data and leveraging social and behavioral theories, massive spatially explicit microsimulations may allow us to simulate entire megacities at the person level. The simulated worlds, which do not capture any identifiable personal information, allow us to perform “in silico” experiments using the simulated world as a sandbox in which we have perfect information and perfect control without jeopardizing the privacy of any actual individual. In silico experiments have become commonplace in other scientific domains such as chemistry and biology, permitting experiments that foster the understanding of concepts without any harm to individuals. This work describes challenges and opportunities for leveraging massive and realistic simulated alternate worlds for in silico human mobility data science.
Key Words: Spatial Simulation, Mobility Data Science, Trajectory Data, Location Based Social Network Data, In Silico
![]() |
The Patterns of Life Simulation. A video of the simulation can be found at: https://www.youtube.com/watch?v=rP1PDyQAQ5M. |
![]() |
Envisioned framework for a simulation that exhibits both realistic behavior and realistic movement. |
Full reference:
Züfle, A., Pfoser, D., Wenk, C., Crooks, A.T., Kavak, H., Anderson, T., Kim, J-S., Holt, N. and Diantonio, A. (2024), In Silico Human Mobility Data Science: Leveraging Massive Simulated Mobility Data (Vision Paper), Transactions on Spatial Algorithms and Systems (pdf).
The GRASS GIS 8.4.0 release provides more than 520 improvements and fixes with respect to the release 8.3.2.
The post GRASS GIS 8.4.0 released appeared first on Markus Neteler Consulting.
Taylor Geospatial Engine and the core team of the Field Boundaries for Agriculture (fiboa) project are happy to share another technical update from Matthias Mohr and the Cloud-Native Geospatial Foundation on the continued development of open source tools to accelerate innovation in AI and computer vision models to extract field boundaries from earth observation imagery.
The post Creating Interoperable Field Boundary Data with the fiboa Converter Tool | Cloud-Native Geospatial Foundation appeared first on Taylor Geospatial Engine.
Part 1 This is part 2 which will be pretty simple. This is more of the “get organized” part. In part 1 I was able to generate a watershed boundary from LIDAR Elevation Data. After it was generated I went back and checked the watershed line and really only found one thing that looked weird. […]
The post Watershed Geo Part 2 appeared first on North River Geographic Systems Inc.
It’s already been three years since the last release of the STAC specification and it’s time to improve the specification based on feedback from the STAC community that we received since the last release. After some intense time of discussions and document editing, we are proud to announce the release of STAC 1.1.0-beta.1.
The focus has been the addition of a common band construct to unify the fields eo:bands
and raster:bands
.
Additionally, Item Asset Definition (field item_assets
) - formerly a popular STAC extension - is now part of the core specification.
Various additional fields have been made available via the common metadata mechanism, e.g. keywords
, roles
, data_type
and unit
.
We collaborated closely with the editors of OGC API - Records to align better with STAC, which resulted, for example, in a change to the license
field.
The link object was extended to support additional HTTP mechanisms such as HTTP methods other than GET
and HTTP headers.
The best practices have evolved and various minor changes and clarifications were integrated throughout the specification.
This is a beta release with the option to make changes before a final 1.1.0 release if the feedback of the STAC community asks for it in the next weeks. At the time of this beta release, no further changes are planned. You can find the open issues for 1.1.0 in the GitHub issue tracker if any get submitted in the next weeks.
We’d appreciate it if the STAC community takes some time to implement the new version and give feedback via the GitHub issue tracker so that we can ensure the new version is mature and well-received by the community. We hope that the STAC ecosystem catches up with the changes in the specification in the next weeks and months, any help that you can provide is highly appreciated. If you want to fund the STAC work we do, please get in touch, too.
A shoutout to all the participants and sponsors of the last STAC sprint in Philadelphia, who laid a solid basis for this release. Emmanuel Mathot and I were then funded by the STAC PSC to finalize the work. Thank you to everyone who made this possible.
Please read the Changelog for all changes that have been made to the specification since v1.0.0. In the following sections, we’ll highlight the most important changes in the specification with some JSON snippets.
As of STAC 1.1, the bands
array merges the similar but separate fields eo:bands
and raster:bands
, which was probably one of the most annoying things in STAC for historical reasons. The new bands
field can be used in combination with property inheritance to provide users with more flexibility.
It should be relatively simple to migrate from STAC 1.0 (i.e. from eo:bands
and/or raster:bands
) to the new bands
array.
Usually, you can merge each object on a by-index basis.
For some fields, you need to add the extension prefix of the eo
or raster
extension to the property name.
Nevertheless, you should consider deduplicating properties with the same values across all bands to the Asset.
Please also consider the Bands best practices when migrating from eo:bands
and raster:bands
. It also provides more specific examples.
STAC 1.0 example:
{
"assets": {
"example": {
"href": "example.tif",
"eo:bands": [
{
"name": "r",
"common_name": "red"
},
{
"name": "g",
"common_name": "green"
},
{
"name": "b",
"common_name": "blue"
},
{
"name": "nir",
"common_name": "nir"
}
],
"raster:bands": [
{
"data_type": "uint16",
"spatial_resolution": 10,
"sampling": "area"
},
{
"data_type": "uint16",
"spatial_resolution": 10,
"sampling": "area"
},
{
"data_type": "uint16",
"spatial_resolution": 10,
"sampling": "area"
},
{
"data_type": "uint16",
"spatial_resolution": 30,
"sampling": "area"
}
]
}
}
}
After migrating to STAC 1.1 this is ideally provided as follows:
{
"assets": {
"example": {
"href": "example.tif",
"data_type": "uint16",
"raster:sampling": "area",
"raster:spatial_resolution": 10,
"bands": [
{
"name": "r",
"eo:common_name": "red",
},
{
"name": "g",
"eo:common_name": "green"
},
{
"name": "b",
"eo:common_name": "blue"
},
{
"name": "nir",
"eo:common_name": "nir",
"raster:spatial_resolution": 30
}
]
}
}
}
Apart from a much shorter and more readable list of bands, you’ll notice the following:
bands
.common_name
and spatial_resolution
were renamed to include the extension prefixes.data_type
and raster:sampling
(renamed from sampling
) were deduplicated to the Asset
as the values were the same across all bands.spatial_resolution
was also deduplicated, i.e. 10
is provided on the asset level,
which is inherited by the bands unless explicitly overridden.
Therefore, the nir
band overrides the value 10
with a value of 30
.As a result, the new bands
array is more lightweight and easier to handle.
To make all this possible there were corresponding changes and releases for the following two extensions:
To better align with OGC API - Records, we slightly changed the license
field.
The license field additionally supports SPDX expressions and the value other
.
At the same time, the values proprietary
and various
were deprecated in favor of SPDX expressions and other
.
The new value other
also solves an issue many data providers reported with the term proprietary
, which was misleading for open licenses that were just not listed in the SPDX database.
The list of fields in the STAC common metadata model was extended, which partially was a result of the changes to the bands mentioned above.
The following fields were added:
bands
(see above)keywords
(as formerly defined in Collections)roles
(as formerly defined in Assets)data_type
, nodata
, statistics
and unit
(as formerly defined in the Raster extension)Please also note that the specification was restructured a bit so that common elements such as Assets and Links are not defined in each specification (Catalog, Collection, Item) anymore, but instead, they are separately defined once in the commons folder.
The Link Object used for the links
field, has been extended to support more HTTP mechanisms.
The additions were already specified in STAC API 1.0.0 but were forgotten to be added to STAC 1.0.0, so we are catching up now.
The following additional fields are available for links:
method
: The HTTP method (e.g. POST
) to use for the links (defaults to GET
)headers
: The HTTP headers to send alongside the link requestbody
: The request body to send for the link request (usually only applies to POST
, PUT
and PATCH
methods)In addition to the extended Link Object, various smaller changes were made to link-related subjects:
self
are now validated and as such are required to be absolute.For links with relation type parent
and root
a clarification was issued:
Conceptually, STAC entities shall have no more than one parent entity.
As such, STAC entities also can have no more than one root entity.
So there’s usually just one link with root
or parent
relationship
unless different variations of the same conceptual entity exist (identified by the ID).
Different variations could be:
type
property), e.g. a HTML version in addition to JSONhreflang
property). e.g. a German version in addition to EnglishSimilarly, it was clarified that multiple collections can point to an Item, but an Item can only point back to a single collection.
The item_assets
field that was previously an extension is now part of the STAC specification.
It was probably the most commonly used extension and many extensions were defining schemas for it, so it is simpler to have it in the core specification.
No changes are required in the migration, although you can remove the schema URI of the extension (i.e. https://stac-extensions.github.io/item-assets/v1.0.0/schema.json
) from the stac_extensions
property.
It doesn’t hurt to keep the schema URI though and although the extension was deprecated, the extension can still be used in STAC 1.0.0. It just won’t get any updates in the future, because any changes will be directly integrated into STAC itself.
The following best practices were introduced or have changed:
thumbnail
, overview
and visual
Please consult the best practices document for details.
In addition to the changes in the core specification, we also updated some extensions. I already mentioned the deprecation of the Item Assets extension and the band-related changes in the EO and Raster extensions above.
An additional change in the raster extension is that the common band names have been extended, especially rededge
was split into multiple common names (rededge071
, rededge075
and rededge078
) and green05
was added.
This allows a direct mapping between the STAC common names and the Awesome Spectral Indices project.
We also released a significant change to the Projection extension.
The change allows to provide CRS codes for authorities other than EPSG, e.g. OGC or IAU.
Previously only EPSG was supported via the proj:epsg
field, e.g. "proj:epsg": 3857
for Web Mercator.
The new version replaces the field with proj:code
so that the authority is included in the code, e.g. "proj:code": "EPSG:3857"
.
Version 2.0.0 of the Projection extension removes and forbids the use of proj:epsg
.
If you want to migrate more gracefully, you can also migrate to the intermediate version 1.2.0 which deprecates proj:epsg
and adds proj:code
at the same time.
description
is given, it can’t be emptystart_datetime
and end_datetime
are inclusive boundsIn May, we discussed Field Boundaries for Agriculture (fiboa) and the fiboa ecosystem and mentioned that there is a new converter tool, which can take non-fiboa datasets and help you turn it into fiboa datasets. Back then we had 5 very similar datasets converted. In the meantime, we’ve converted additional datasets and improved the converter tool. Today, we’d like to give an update on the status and show how easy it is for you to make your field boundaries more useful by converting and providing them in a “standardized” format.
Seven people are currently working on creating more than 40 converters:
How does it work that we can convert so many datasets so easily?
We have implemented the fiboa Command Line Interface (CLI), which is a program that offers various tools to work with field boundary data. One of them is a command to convert field boundary datasets from their original form into fiboa. As field boundary data usually looks very similar conceptually (a geometry plus additional properties), most of the steps that are needed to convert the datasets to a standardized format can be abstracted away.
This means that the converter just needs a couple of instructions that describe the source data and then it can do all other steps automatically. The code is in Python, but we provide a template that is documented and contains examples. It’s pretty simple to fill and most people that have some programming experience should be able to make the necessary changes. Let’s look at the most important steps in the conversion process:
Read the data from the files in various formats such as Shapefiles, GeoPackages, GeoParquet files, GeoJSON, etc. The files can be loaded from disk or from the Internet. It can handle multiple source files, extract from ZIP files, etc. For example:
SOURCES = "https://sla.niedersachsen.de/mapbender_sla/download/FB_NDS.zip"
This downloads the data from the given URL and loads the data from the files contained in the ZIP archive.
Run filters to remove rows that shall not be in the final data, for example:
COLUMN_FILTERS = {
"boundary_type": lambda column: column == "agricultural_field"
}
This keeps only the geometries for those boundaries that are of type agricultural_field
.
Add properties (i.e. columns) with additional information, for example:
ADD_COLUMNS = {
"determination_method": "auto-imagery"
}
This adds the information that the boundaries were detected by a ML algorithm to the predefined property determination_method
.
Rename and/or remove properties, for example:
COLUMNS = {
"fid": "id",
"geometry": "geometry",
"area_sqm": "area",
"custom_property": "custom_property"
}
This is the list of property that you want to keep from the source data (here: fid
, geometry
and area_ha
). The given property will be renamed to id
, geometry
and area
, which are predefined properties in fiboa. All other properties in the source data would get removed.
Add custom properties, for example:
MISSING_SCHEMAS = {
"properties": {
"custom_property": {
"type": "string",
"enum": ["A","B","C"]
}
}
}
This defines that your custom property that is not predefined in fiboa or an extension with the name custom_property
is of type string and can be any of the uppercase letters A, B, C (or null
).
This is probably the most complex task as it requires to define a schema for every custom property that you want to provide in addition to the predefined properties in fiboa or its extensions.
You can find more information about the schemas in the fiboa Schema specification.
Change the data values, for example:
COLUMN_MIGRATIONS = {
"area_sqm": lambda column: column / 10000
}
This would convert the values for the property area_sqm
from square meters to hectares (as the area
property in fiboa requires the area to be in hectares).
Create a file with additional metadata (i.e. a STAC Collection with description, license, provider information, etc.). That just requires updating some variables, for example:
ID = "de"
SHORT_NAME = "Germany"
TITLE = "Field boundaries for Germany"
LICENSE = "CC-BY-4.0"
...
Finally, write the data to a GeoParquet file.
Additionally, you should provide some general metadata about the dataset (e.g., title, description, provider, license) so that people know what they can expect. That’s not really needed for the conversion though so I don’t cover them here. The template also provides additional options and parameters that can be fine-tunes for your needs. You can have a look at the full template. It has documentation included and gives some examples. For even more examples check out the filled templates for the existing converters. There’s also a tutorial that describes how to create a converter, either in written form or as a YouTube video.
Once this is done, you can make the converter available by creating a Pull Request on GitHub and eventually it will be available to others. Everyone with access to the data can then convert to fiboa at any time, e.g. if the source data has been updated. It’s really simple. To convert data you just can use the following command:
fiboa convert X -o result.parquet
Just replace X with any of the available converters. Use the command fiboa converters
to list all available converters.
The result.parquet
file can then be loaded into any software that can read Parquet files, for example QGIS. It can also be loaded in many programming languages such as Python or R. Loading multiple fiboa-compliant datasets makes it much simpler to work across multiple datasets as many properties are already aligned, so it’s analysis-ready and can be used very fast and efficiently.
This makes it very easy to work with massive amounts of field boundary data, even across providers, and work against a standardized interface without the need to manually preprocess the data heavily to be comparable. For more details see also the section “Why a farm field boundary data schema?” in our blog post “Introduction to fiboa”.
Join us and make your datasets available in a standardized format now! We are happy to help if you run into issues, just get in touch via email or GitHub issues or pull requests.
TGE selected Dr. Hannah Kerner as an academic seed grant awardee for the Field Boundaries for Agriculture initiative. Dr. Kerner is an Assistant Professor in the School of Computing and Augmented Intelligence at Arizona State University. Dr. Kerner is pioneering new machine learning techniques to leverage remote sensing data in addressing global challenges such as food insecurity and climate change.
The post Innovation Bridge Community Spotlight: Dr. Hannah Kerner appeared first on Taylor Geospatial Engine.
Yes, remote sensing can be used for automated trading, primarily by integrating geospatial data into trading algorithms to provide insights and predictive signals. In fact, trading apps like Immediate Edge use this type of technology. Here are some ways remote sensing can be applied in automated trading: Agricultural Monitoring: Satellite imagery can monitor crop health, […]
The post Can Remote Sensing Be Used for Automated Trading? first appeared on Grind GIS-GIS and Remote Sensing Blogs, Articles, Tutorials.TL/DR – Funding is Hard. Last week I ran down a rabbit hole. While I’m passable as a sysadmin – my skills are lacking in some areas. I had a conversation with someone on hardware and software and we ended on “How would you run this 911 stack of software if you had the choice?”. […]
The post Geoserver Rabbit Holes appeared first on North River Geographic Systems Inc.
Tell Us About Yourself I’m a GIS Analyst at Summit Design and Engineering Services where I manage data and collection processes for projects focused on asset maintenance. I started my career in GIS on a whim; I was thinking of getting my masters in Geology but realized I would need to fill a GIS-sized hole in […]
The post Maps and Mappers of the 2024 Calendar – Katherine Rudzki – July appeared first on GeoHipster.
Today, we celebrate a true geospatial legend: GRASS GIS!
The post Happy 41st birthday, GRASS GIS! appeared first on Markus Neteler Consulting.
Making sense of huge amounts of remote sensing data is a job that many companies are working hard to solve. TorchGeo aims to fill the gap between deep learning and remote sensing.
The post Remote Sensing and Computer Vision: TorchGeo Data Loading appeared first on Sparkgeo.
So this will be my last “New Class” announcement for a bit. Back in the spring of 2024, I taught a 4 hour QGIS and LIDAR class at the TNGIC meeting (State of TN GIS meeting). I ran through it and shelved it and now I’m fixing the problem spots and smoothing down the rough […]
The post QGIS and LIDAR appeared first on North River Geographic Systems Inc.
Through a long series of events – here we are: https://qgis-us.org The events: So – we have a new domain and a new website. So what does that mean? While at best the US group isn’t very active – maybe this can spur some more activity. Anyway – In honor of the ESRI UC – […]
The post QGIS US Users Group appeared first on North River Geographic Systems Inc.
Over two years ago, the GeoParquet project brought together a diverse group of interests around a clear objective: standardizing how geospatial data is used within Parquet. The initial goal was modest: to ensure that any tool reading or writing spatially located geometries (points, lines and polygons) does so in a consistent and interoperable way.
But the ultimate goal of the effort has been to make geospatial a primary data type within the broader data community, thereby breaking the ‘GIS’ data silo and enabling the seamless integration of geospatial data with all other data types. We envision a world where spatial data is simply another column in your dataset, not a special case requiring unique handling. This integration will unlock new insights, reduce the need for specialized tools, and make geospatial information accessible to a broader range of users and innovations.
Without standardization, the current situation is that geospatial datasets are often non-interoperable. Spatial data might be a column in each system, but moving data between systems requires extensive overhead to transform it properly due to insufficient metadata. Naively adding geospatial types to big data systems often leads to poor performance because the necessary metadata and indexes for effective spatial operations are not considered. This fragmentation and inefficiency highlight the urgent need for standardized approaches.
Since its inception, the GeoParquet group has launched versions 1.0 and 1.1, witnessing significant adoption across various tools (over 20 tools and libraries implement the specification) and datasets.
A few of the organizations that provide software libraries and tools for GeoParquet. Their contributions are building a robust ecosystem for geospatial data management and analysis using GeoParquet.
One of the main design goals for the GeoParquet specification was to make it as easy as possible for a non-geospatial expert to implement, while also providing the ability to properly handle any of the obscure requirements that geospatial experts need for critical applications.
If someone has a bunch of longitude and latitude GPS points they should be easily able to figure out how to store them in GeoParquet without having to understand coordinate reference systems, polygon winding orders and spherical edges. But that point data should work seamlessly when it’s joined with data exported from 3 different national governments who all use different projections, in a system that needs the epoch right because it requires sub-centimeter accuracy in an area where the movement of the continental plates affects the output.
Getting the right balance of simplicity and complexity for GeoParquet has involved extensive discussion for each of the resulting 8 metadata fields. One major goal for each field was to establish good default values, so that systems that did not have complex requirements could safely ignore them while also naturally doing the right thing when writing out the data.
The resulting collection of metadata fields ensures that geospatial data transferred across systems can be fully understood without ambiguity or errors. Our hope is that these fields can be leveraged by other systems who wish to add geospatial support, enabling them to start simple by hardcoding a smaller number of acceptable values, but able to start with the right fields to handle all the nuance of the geospatial world.
Initially, we added this metadata as an extension in the form of a JSON string within the Parquet file metadata, as that was Parquet’s only available extension point. However, with growing interest in geospatial support within the data community, it’s time to refine our strategy. Parquet is not alone in the data ecosystem. The rise of open table formats in data lakes and other technologies makes it clear that spatial types need to be fully handled at all layers.
Last month, we organized a meetup in San Francisco, inviting people working on various technologies like Parquet, Arrow, Iceberg, Delta, and others interested in adding geospatial support. The consensus was the need for a coordinated approach to ensure geospatial types are handled correctly across all levels of the stack, in order to avoid interoperability issues between different layers and necessary transformations.
Currently, groups are working on adding geospatial capabilities to Arrow, Parquet, Iceberg, Delta, etc. Our proposal is to coordinate these efforts. We suggest leveraging the research and discussions from the GeoParquet group, as documented in its specification (with extensive justification for the decisions available in the issues and pull requests). The ideal outcome is that GeoParquet itself becomes unnecessary, with geospatial being treated as a primary data type in all relevant formats and protocols, accompanied by the right metadata.
The tentative conclusion of our meeting in San Francisco was to start with the standardization of Well-Known Binary (WKB) support in Arrow, Parquet, and Iceberg, representing these as native types across these technologies. This likely consists of three main tasks:
And getting these basics working is just the beginning. The potential of a second phase is to go beyond WKB and align on optimized geometry encodings that fit more directly into the paradigms of modern data formats and protocols, enabling more performance and efficiency in an interoperable way.
Finally, this initial path of interoperability is thus far only focused on geospatial vector data. Similar initiatives will be needed to fit other types of geospatial data such as raster, point clouds, discrete global grid systems (such as H3 and S2) into the mainstream data formats & protocols.
We call on the community to combine these efforts into a single effort, ensuring all pieces fit together seamlessly. We propose to center this effort around the existing GeoParquet community group, which meets bi-weekly and has already conducted extensive discussions. The task of integrating geospatial types into various data stacks will not be simple, but we have a clear roadmap compatible with all existing initiatives.
If you are working on geospatial support in any relevant data technology, please consider joining the GeoParquet meetup group (suggestions are welcome for a new name that reflects the broader collaboration) and collaborate with others - just email requests [at] geoparquet.org and ask to be added to the calendar invite. Now that geospatial is being added to many standards and protocols, we have the opportunity to coordinate our efforts and establish robust geospatial support from the start, reducing frictions and limitations for years to come.
Interested in seeing this happening and can provide financial support? We also want to hear from you. There is already some initial funding available from OGC, sponsored by Planet and CARTO, to push forward Iceberg support for geospatial.
The future for geospatial is bright! Let’s work together to ensure it integrates seamlessly into the broader data ecosystem.
Checkout the notes from the San Francisco Meeting: GeoParquet in person meetup at Data + AI conference
I work with two counties in TN with the TN NG911 standard. Flash back 10 years ago and I hadn’t really grasped how complicated address is. Do I like it? Mostly Yes. Occasionally I get questions from outside TN and I demo the two counties and how that works. A while back someone called and […]
The post NENA 911 Database in QGIS appeared first on North River Geographic Systems Inc.
It’s that time of year. In order to make this smoother this year the first official calendar recipient is me to make sure there are no glaring errors like “Hey it’s not 1984”. We’re excited to issue the call for maps for contributions for the 2025 GeoHipster calendar. Entries are subject to these rules, and […]
The post 2025 Calendar Submissions appeared first on GeoHipster.
I typically spend June wondering about what I’m doing, have done, and need to do work wise. One thing I keep thinking about is “the old days”. By old days I mean the start of my career which was doing a lot of Watershed Mapping. Back in the mid 90’s we would be approached with […]
The post Watershed GIS Part 1 appeared first on North River Geographic Systems Inc.
The PostGIS Team is pleased to release PostGIS 3.5.0alpha2! Best Served with PostgreSQL 17 Beta2 and GEOS 3.12.2.
This version requires PostgreSQL 12 - 17, GEOS 3.8 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. SFCGAL 1.4-1.5 is needed to enable postgis_sfcgal support. To take advantage of all SFCGAL features, SFCGAL 1.5 is needed.
This release is an alpha of a major release, it includes bug fixes since PostGIS 3.4.2 and new features.
The PostGIS Team is pleased to release PostGIS 3.5.0alpha1! Best Served with PostgreSQL 17 Beta2 and GEOS 3.12.2.
This version requires PostgreSQL 12 - 17, GEOS 3.8 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. To take advantage of all SFCGAL features, SFCGAL 1.5.0+ is needed.
This release is an alpha of a major release, it includes bug fixes since PostGIS 3.4.2 and new features.
I had a boss that would always start off with “So the Short story is….” and you’d be stuck for at least 30 minutes and maybe an hour listening to a story that didn’t go anywhere. So the short story on the Mergin Maps Class………. NRGS is a Mergin Maps partner. A little over a […]
The post Mergin Maps Training Class appeared first on North River Geographic Systems Inc.
The climate community has long developed reliable climate models grounded in trusted Earth systems data and physics, but it has not been until recently that human dynamics and feedbacks have been viewed as a necessary coupling within these models. Including human dynamics within integrated models necessitates a forecasted understanding of human transitions within the landscape. The geospatial science domain has typically not looked forward through simulations. Advances in agent-based modeling, synthetic population generation, and GeoAI/GenAI are presenting new opportunities for generating future-oriented representations of humans landscapes, enabling the development of scenario-specific forecasted datasets, such as synthetic satellite imagery, land cover/land use, the built environment, and more. This session will explore the boundaries of geospatial modeling, data synthesis, and microsimulations for forecasting. Emphasis will be placed on research and studies that show how synthetic forecasted data can enable high fidelity assessments of climate futures and population impacts.
Crooks, A.T. (2024), Michael Batty, in Gilmartin, M., Hubbard, P., Kitchin, R. and Roberts, S. (eds.), Key Thinkers on Space and Place (3rd edition), Sage, London, UK. pp. 37-43. (pdf)
Accurate distance and angle measurements on Earth’s surface are the focus of land surveying, both an art and a science. The precision of survey measures is affected by several variables, and even a small inaccuracy might have far-reaching consequences for the design. Using a GIS makes data collecting, planning, and management much more efficient. The […]
The post Applications of GIS in surveying first appeared on Grind GIS-GIS and Remote Sensing Blogs, Articles, Tutorials.As Earth observation data becomes more abundant and diverse, the Earth Observation user community has spent considerable effort trying to find a common definition of “Analysis-Ready Data” (ARD). One of the most obvious reasons this is hard is that it relies on the assumption that we can predict what kind of analysis a user wants to perform. Certainly, someone using satellite imagery to analyze evapotranspiration is going to need something very different from someone trying to detect illegal mines.
Despite this, we believe there is some degree of preprocessing, metadata provision, and harmonization that will be useful for most users to move more quickly.
This blog post is an overview of our current thinking on ARD based on work we’ve been doing with NASA, the Committee on Earth Observation Satellites (CEOS) Systems Engineering Office, and others over the past year.
Some of the challenges of aligning around a definition of ARD has been solved by adoption of the SpatioTemporal Asset Catalogs (STAC) metadata specification. In an ideal case, STAC metadata allows users to load data easily into a variety of configurations that might suit their needs – e.g., into a datacube.
STAC was designed to be flexible and has an intentionally small core that can be added to via extensions. Many STAC extensions have been developed, but there is no clear guidance on which extensions could be added to a STAC metadata catalog to create something that would be considered ARD. The best extensions to enable ARD may not even exist yet.
We need to create best practices that define which STAC extensions should be used (or developed) to signal that data should be considered analysis-ready. If we do this, we could use the concept of a STAC “profile” to define which combination of STAC extensions should be used and validated to create ARD data.
The list of published CEOS-ARD Product Family Specifications.
CEOS-ARD (formerly known as “CARD4L”) defines “product family specifications” for eight product types which are either categorized as “Optical” or “Radar”. CEOS-ARD describes itself as follows:
CEOS Analysis Ready Data (CEOS-ARD) are satellite data that have been processed to a minimum set of requirements and organized into a form that allows immediate analysis with a minimum of additional user effort and interoperability both through time and with other datasets.
This is a great resource to start from, especially with regards to the data pre-processing requirements. A lot of smart people from different space agencies have worked on it and concluded on a set of minimum requirements that are accompanied by additional optional requirements. The requirements are currently specified in (mostly tabular) PDF/Word documents, but they are in the process of migrating to a GitHub and Markdown based process. Implementers can go through the documents and check to see if they fulfill the requirements. For the synthetic aperture radar (SAR) product family specification, there’s additionally an XML metadata encoding available.
CEOS-ARD is a great foundation, and we have identified several areas where we believe it could be made even better.
The SAR product family specifications include a large number of requirements that make it difficult to implement. SAR data is inherently complex, so I don’t expect we can change a lot, but there are likely some small tweaks we can propose to CEOS.
For example, the SAR product family specification includes various file related metadata requirements that mostly apply to RAW data (e.g., header size and the byte order). Is RAW data really analysis-ready if such properties must be known? Usually software should handle that. Building on this, perhaps ARD formats should only be based on formats that are readable by GDAL because that is what most users use. In the geospatial world, GDAL is effectively a de-facto standard at this point, as most users use it directly or use software that uses it under the hood.
Ultimately, we should aim to make the ARD specification as complex as needed but as simple as possible. We also need to be careful to not have too many optional requirements and instead prioritize requirements that bring the most value to most users.
CEOS-ARD is already somewhat split into building blocks, which are good enough for the purpose of CEOS-ARD. In general, however, they are too broadly scoped.
For example, one building block aims to cover all “product metadata” rather than discrete metadata about the capturing satellite or projection information. Ideally, CEOS-ARD would be defined in smaller blocks that the individual product family specifications can pick from, minimizing the development time of new product family specifications.
These building blocks could even match STAC extensions, but that might be difficult in some cases due to different scoping (STAC was primarily designed for search and discovery) and due to the fact that they are cleanly discretized (more on that below).
Another reason to split into smaller building blocks is that it allows smaller groups of experts to more easily work on individual building blocks. Limiting the scope ensures that the work is still manageable and you can get to a result in a reasonable amount of time. I believe we should aspire to follow the best practices of categorization / discretization as defined by Peter Strobl (JRC) (the slides are available through the OGC portal):
- assessable: all characteristics used for distinguishing categories must be identified, and objective/measurable
- unambiguous: categories mutually exclude each other
- gap free: each item can be assigned to a specific category
- intrinsic: assignment of an item to a category is independent of other, not previously agreed characteristics
- instantaneous: assignment of an item to an category is independent of that of other items
- product family specifications: different granularity in categorization is achieved by a hierarchy
Getting the categorization of the building blocks right will be very challenging, but I believe that once we have that settled it will be much easier to fill them with definitions for ARD. From there, it will be easier to evolve them into sets of building blocks that will be useful for specific types of data.
Users currently seem to benefit a lot from STAC, as its standardization of metadata brings them one step closer to ARD already. CEOS-ARD has a separate XML-based metadata language, which is not ideal. I believe CEOS-ARD should be expressed in STAC, because it provides the metadata in a form that people already use and is supported by a vibrant software ecosystem.
ARD will only be adopted if it is supported by a software ecosystem that has users. Any approach to ARD should try to benefit from an existing ecosystem of software and learning resources, which STAC offers. Analysis-ready data and metadata is on it’s own not analysis-ready. It always needs software that can actually benefit from the metadata and make the data available to users in a way that they can directly start in their comfort zone. That is usually their domain knowledge and a analysis environment such as Python, R, QGIS, or some other more or less popular application or programming language.
How does all this relate to the ongoing work in the OGC/ISO ARD Standards Working Group?
We believe that such a complex and large topic can’t be built top-down, it should be built bottom-up. It needs to evolve over time. Only once it has been iteratively improved, has implementations and some adoption, it should go through standardization.
ISO specifically doesn’t seem to be agile enough for this incremental approach, and I’m not sure about OGC. The emergence of STAC is an example of how a community-led de-facto best practice can emerge and then “graduate” into becoming a de-jure OGC (community) standard.
We believe that the STAC-based approach for CEOS-ARD described in this post brings a lot to the table, and that any ISO/OGC work on ARD – however it continues in the next months – should be based on it. More specifically, we believe that ISO/OGC’s approach to ARD should be a superset of CEOS-ARD. If (breaking) changes need to be proposed by the ISO/OGC work, they should be fed back into CEOS-ARD. If we end up with multiple more or less similar ARD specifications, everyone will loose.
I hope this post provokes discussions among our community members who care about ARD. You may not agree with any of this, and it might not be the ultimate solution to ARD, but it may be a practical interim step while we figure out a “full-fledged” ISO ARD standard. Let us know what you think!
Drawing from what I’ve described here, the proposed next steps could be:
The GRASS GIS 8.4.0RC1 release provides more than 515 improvements and fixes with respect to the release 8.3.2. Please support us in testing this release candidate.
The post GRASS GIS 8.4.0RC1 released appeared first on Markus Neteler Consulting.
Tell Us About Yourself My name is Tracy Homer. I graduated from University of Tennessee, Knoxville in December 2023 with a degree in GIS. I currently work for the Software Freedom Conservancy – a non profit organization that focuses on ethical technology and open source license compliance, so I only do mapping in my fun […]
The post Maps and Mappers of the 2024 Calendar – Tracy Homer – June appeared first on GeoHipster.
Topographical maps are detailed maps that use contour lines to show the appearance of the earth’s surface. These maps accurately represent earth features like roads, buildings, railways, and mountains, among other features. Therefore, topographical maps illustrate any natural or artificial geographical feature on the earth’s surface. Other than locating artificial and natural features, topography can […]
The post Applications of Topographical maps first appeared on Grind GIS-GIS and Remote Sensing Blogs, Articles, Tutorials.In the paper we use social, demographic and economic (e.g., US Census) variables to predict COVID-19 vaccine hesitancy levels in the ten most populous US metropolitan statistical areas (MSAs). By using machine learning algorithms (e.g., linear regression, random forest regression, and XGBoost regression) we compare a set of baseline models that contain only these variables with models that incorporate survey data and social media (i.e., Twitter) data separately.
We find that different algorithms perform differently along with variations in influential variables such as age, ethnicity, occupation, and political inclination across the five hesitancy classes (e.g., “definitely get a vaccine”, “probably get a vaccine”, “unsure”, “probably not get a vaccine”, and “definitely not get a vaccine”). Further, we find that the application of the models to different MSAs yields mixed results, emphasizing the uniqueness of communities and the need for complementary data approaches. But in summary, this paper shows social media data’s potential for understanding vaccine hesitancy, and tailoring interventions to specific communities. If this sounds of interest, below we provide the abstract to the paper along with our mixed methods matrix, data sources used and the results from the various MSAs. At the bottom of the post, you cans see the full reference and the link to the paper so you can read more if you so desire.
The COVID-19 pandemic prompted governments worldwide to implement a range of containment measures, including mass gathering restrictions, social distancing, and school closures. Despite these efforts, vaccines continue to be the safest and most effective means of combating such viruses. Yet, vaccine hesitancy persists, posing a significant public health concern, particularly with the emergence of new COVID-19 variants. To effectively address this issue, timely data is crucial for understanding the various factors contributing to vaccine hesitancy. While previous research has largely relied on traditional surveys for this information, recent sources of data, such as social media, have gained attention. However, the potential of social media data as a reliable proxy for information on population hesitancy, especially when compared with survey data, remains underexplored. This paper aims to bridge this gap. Our approach uses social, demographic, and economic data to predict vaccine hesitancy levels in the ten most populous US metropolitan areas. We employ machine learning algorithms to compare a set of baseline models that contain only these variables with models that incorporate survey data and social media data separately. Our results show that XGBoost algorithm consistently outperforms Random Forest and Linear Regression, with marginal differences between Random Forest and XGBoost. This was especially the case with models that incorporate survey or social media data, thus highlighting the promise of the latter data as a complementary information source. Results also reveal variations in influential variables across the five hesitancy classes, such as age, ethnicity, occupation, and political inclination. Further, the application of models to different MSAs yields mixed results, emphasizing the uniqueness of communities and the need for complementary data approaches. In summary, this study underscores social media data’s potential for understanding vaccine hesitancy, emphasizes the importance of tailoring interventions to specific communities, and suggests the value of combining different data sources.
Data sources used in our study. |
MSA model performance (Bolded adjusted R2 values represent the best performing model for each modeling technique and MSA). |
Referece:
Sasse K, Mahabir R, Gkountouna O, Crooks A, Croitoru A (2024) Understanding the determinants of vaccine hesitancy in the United States: A comparison of social surveys and social media. PLoS ONE 19(6): e0301488. https://doi.org/10.1371/journal.pone.0301488 (pdf)
![]() |
Looker room layouts (Source: Gao et al., 2024) |
Models have ranged form looking at the spatial arrangement of locker rooms at ski resorts (Gao et al., 2024) to lift lines (congestion) in places such as La Plagne in the French Alps (Poulhès and Mirial, 2017) or the Austrian ski resort of Fanningberg (Heinrich et al., 2023). Others have simulated entire ski areas including lift lines, slopes used etc. (Kappaurer 2022). While Pons et al., (2014) developed an agent based model to see how climate change might impact where skies go. Others have explored how climate change might impact ski areas and their associated water usage for making snow (e.g., Soboll and Schmude 2011). Keeping the climate theme, Revilloud et al., (2013) have used agent-based simulations to simulate snow hight on ski runs based on skiers movements in order to facilitate snow cover management (i.e., reduce the production cost of artificial snow and thus water and energy consumption). Murphy (2021) developed a more simple agent-based model of how skiers might ski durring a powder day and explores the area of terrain they may cover based on ability.
![]() |
.Simulation of skiers (source Revilloud et al., 2013) |
Similar to some of the other models above, but in light of COVID19, Integrated Insight (2020), a analytics consulting company shows in the movie below how one can use simulation to explore crowd management in the base areas of ski resorts.
Back to entry 1
So, I got the news from pathology.
There is no cancer left in me, I am officially “cured”.
Since I am still recovering from surgery and relearning what my GI tract is going to do for the future, I don’t feel entirely cured, but I do feel the weight of wondering about the future lifted off of me.
The future will not hold any more major cancer treatments, just annual screening colonoscopies, and getting better post-surgery.
I truly have had the snack-sized experience, not that I would recommend it to anyone. Diagnosed late February, spit off the back of the conveyor belt in late May. Three months in Cancerland, three months too many.
A few days ago NBA great Bill Walton died of colorectal cancer. It’s the second most common cancer in both men and women, and you can avoid a trip to Cancerland through the simple expedient of getting screened. Don’t skip it because you are young, colorectal cancer rates amount people under 50 are going up fast, and nobody knows why (there’s something in the environment, probably).
![]() |
The top 50 words appearing in the titles of all 50 years and from the 1970s to 2020s |
![]() |
The evolution of topics over time based on topic modeling. |
Crooks, A.T. (2024), Environment and Planning B: Its Shaping of Urban Modeling and Me, Environment and Planning B, 51(5) 1020-1022. (pdf)
Crooks, A.T., Jiang, N., See, L. Alvanides, S., Arribas-Bel. D., Wolf, L.J. and Batty, M. (2024), EPB Turns 50 Years Old: An Analytical Tour of the Last Five Decades, Environment and Planning B, 51(5): 1028-1037. (pdf)
On behalf of the Cloud-Native Geospatial Foundation, we invite your organization to become a sponsor for the upcoming working sprint on Enhancing Air Quality Data Access in Africa. This event will be held on July 30-31, 2024 at the Radisson Blu Hotel & Convention Centre in Kigali, Rwanda. Building on Rwanda’s leadership in air quality initiatives, we’re convening local professionals, as well as experts from across Africa and around the world.
Africa faces some of the most severe air quality challenges. Fortunately, the affordability of air quality sensors have led to improved monitoring and a wealth of data across the continent and the world. While this is a positive development, the lack of common data schemas and file formats hinders effective sharing and analysis, limiting this data’s potential to inform air quality solutions. By convening this sprint in Africa, we can leverage the expertise and perspectives of African air quality professionals to explore the need for accessible and user-friendly data formats to make air quality data easier to access and use.
The overall goal is to facilitate easier air quality data sharing by understanding how air quality data is used to inform decision-making and environmental policy by leaders in Africa. This will not only empower African research communities, many of whom are often excluded from global working sprints due to visa restrictions but also benefit air quality analysis efforts worldwide. This sprint is by invitation only but do sign up here to stay informed about future air quality initiatives.
We offer various sponsorship packages to suit your organization’s goals:
Convening Sponsor - pledge at least $10,000. This sponsorship level includes:
Platinum Sponsor - pledge at least $5,000. This sponsorship level includes:
Gold Sponsor - pledge at least $2,500. This sponsorship level includes:
Silver Sponsor - pledge at least $1,000. This sponsorship level includes:
Supporter Sponsor - pledge at least $500. This sponsorship level includes:
To submit your sponsorship pledge and help us ease air quality data sharing, please fill out the form. If you have any questions before or after submitting, please email [email protected]
There may come a time in your mapping application where you need the user to identify a point of interest within a specific area. For example, having the user mark public bathrooms in a park, pinpoint hazards in a construction zone, or identify bus stops within a city. For one project at Sparkgeo, the application …
Geofencing a Mapbox GL marker using Turf Read More »
The post Geofencing a Mapbox GL marker using Turf appeared first on Sparkgeo.
Following up on the importance of data schemas and ID’s blog post, I wanted to dig into the topic of data schemas. In the Cloud Native Spatial Data Infrastructure section, it posited that instead of a model like OpenStreetMap, where everyone contributes to a single database, a better inspiration might be open source software. This way of working wouldn’t require everyone to follow the same set of community and governance norms — it would encourage different approaches and more experimentation, and a wider variety of data types to collaborate around. That section closed with the thought:
We … don’t envision a single data schema that everyone has to align to. Instead, there’s a way to start with a small, common core of information that gives data providers the flexibility to use the pieces that are relevant to them and easily add their own.
I believe the foundation we’ve been laying with Cloud Native Geospatial formats has the potential to lead to much greater interoperability between data, if we can get a few things right. So, I wanted to use this post to explore how we can do things a bit differently and potentially get closer to that vision by building on these great new formats.
The first thing that we can do differently than what came before is to flexibly combine different data schemas. The key is to move away from how the geospatial world has done things with XML, particularly the way validation works. In the XML world, you’d use an XML Schema to define how each field should work, and then the same XML Schema would validate if the fields in your document had the proper values. The big problem was that if an extra field was added, then the validation would fail. You could, of course, extend the XML Schema definition, but it it’s not easy to ‘mix and match’ — to just have a few different XML Schemas validate different parts of your data.
The cool thing is that with JSON Schema, you can easily do this. Six different JSON Schemas can all validate the same JSON file, each checking their particular part. And the situation is similar with Parquet & GeoParquet: there’s nothing that will break there either if the data has an extra column. We’ve used this to great effect in STAC — the core STAC spec only defines a few fields, and includes the JSON Schema to validate the core. But each ‘extension’ in STAC also has its own JSON Schema. Any STAC Validator will use each extension’s schema to validate the whole file.
This has profound implications, since it enables a much more ‘bottom up’ approach to the evolution of the ‘data schema.’ In the XML world, you needed everyone to agree, and if someone disagreed they’d need to fork the XML Schema and redefine what they wanted. Users would then need to pick between which validation they wanted to use. So, it was really important that a top down entity set the standard. With the bottom up approach, anyone can define just a few fields for their own use, and others can define similar things in their own way. Of course, the core ‘thing,’ like SpatioTemporal Assets, needs to be done well. But I believe the key here is to make that core as simple as possible, so many extensions can thrive. Then, it’s just real world usage that decides which fields are important. And it also allows ‘incubation’ — a single organization can just decide to make their own validator for their fields. An example of this is Planet, with the Planet STAC Extension:
It uses a bunch of the common extensions like view, eo, proj and raster. And then it has some fields that are very specific to Planet’s system (item_type, strip_id, quality category), but there are a number of fields for which there is not yet a STAC extension. I say ‘yet’ because these are fields that are likely things that other satellites have for metadata, like clear_percent, ground_control and black_fill. They can be defined for Planet’s validation, since their users expect them. But others in the community can also look at Planet’s and decide to adopt what they did. And if a few different providers all do something similar, we can come together and agree on a common definition that we’ll all use. If two people decide to define the same ‘thing’ and both feel theirs is right, then both can exist, but likely one will gain more adoption and become the standard.
One of my main observations from the last twenty years of working with standards is that having more data following the standard is the key to success. The best thought out specification that everyone agrees on will lose out to a poorly specified set way of doing things that has tons of data that everyone actually uses. This dynamic was clear with GML vs KML. The latter didn’t even have a formal specification for many years, but there was tons of data saved as KML. And the reason there was tons of data in KML in the first place is because it gave you access to an unprecedented amount of data in Google Earth for free — even if that data wasn’t itself available in KML.
So whatever dataset is the largest and most important in an ecosystem usually becomes the standard way of doing things, even if it didn’t set out to be ‘a standard’ — it becomes the defacto standard. Key to STAC’s success was that early on both Landsat and Sentinel 2 were available in the standard, and indeed it enabled those two major datasets to be more interoperable with each other.
With the explosion of satellite imagery and the continued advances in AI and computer vision we’re seeing more datasets that are truly global. The best of these will play a major role in setting the standard data schemas for whatever type of data they represent. Indeed I think we’ll see an interplay between aligning schemas for validated training data about foundational geospatial data types and the schemas of the resulting models. And hopefully, we’ll see major governmental data providers stepping up to the role they play in setting standards — a federal government defining a reusable schema for a particular domain, putting foundational data out in it, and also encouraging each state to use the same schema.
Overture Maps is doing really great work in building open global datasets for some of the most foundational geospatial layers, leveraging AI extensively. And they are taking their role in setting a data schema standard seriously, working to make it a flexible core that other attributes can be added to. I’m also excited to be working on fiboa as part of Taylor Geospatial Engine’s Field Boundary Initiative. It’s centered around the potential to use AI and Earth Observation data to build global datasets, and it’s doing some great innovation around the core data schemas to enable that.
I believe one of the main mistakes of past Spatial Data Infrastructure efforts was to try to punt on the hard problem of getting people to align their data. The message was that everyone could keep their database in its same schema, and the application servers delivering the API’s could just transform everything into standard schemas on the fly. This proved to be incredibly annoying to get right, as being able to map from anything to a complex data schema isn’t easy, and the tools to help do this well never really took off.
I think the fact that a Cloud-Native Spatial Data Infrastructure is fundamentally based on formats instead of API’s means that it will force people to confront the hard problem of actually aligning their data. We should be trying to get everyone actually using the same data schema in their day to day work, not just doing their internal work in one schema and transforming it into another schema to share it with others. It’ll be much easier if you share the buildings file from the city of Belém and its core attributes follow the same definitions as Overture. We shouldn’t have to go through data interchange servers (like Web Feature Services) just to share interoperable data: our goal needs to be making the actual data interoperable.
Obviously it’s unrealistic to expect everyone to just change their core database to a new schema. But it’s easier to use an ETL tool or a bit of code to actually transform the data and publish it than it is to set up a server and define an on the fly schema mapping against its database. And if we can get small core schema definitions with easy to use extensions then we won’t need to convince Belém to drop their schema and fully adopt ‘the standard’ — they should be able to update a couple core fields and define their own extensions that match their existing data schema, and slowly migrate to implementing more of the standard extensions.
The other thing we are starting to do is align with all the investment in mainstream data science and data engineering. One of the ways the founders would explain Planet in the early days was that they were leveraging the trillions of dollars of investment that has gone into the cell phone. Other satellites would buy parts that were ‘made for space,’ and were egregiously expensive because they were specially designed, with no economies of scale. Planet bought mostly off the shelf components, and was able to tap into the speed of innovation of the much bigger non-space world.
By embracing Parquet, we’re starting to do the same thing with geospatial. I think the comparison is apt — we’ve tended to build our own special stacks, reinventing how others do things. Open source geospatial software has been much better, like with PostGIS drafting off of PostgreSQL. But there is now huge investment going into lots of innovation around data.
In the context of data schemas, there are many people looking at data governance, and tools to define and validate data schemas. And so, we should be able to tap into a number of existing tools to do what we want with Parquet, instead of having to build all the tools from scratch.
There’s some relatively easy things we can do to usher in an era of bottom-up innovation in data schemas. This should lead to much greater collaboration, and hopefully start a flywheel of Cloud-Native Spatial Data Infrastructure participation that will lead to a successful global SDI.
I think the key is to make it easy to create simple core schemas with easy to define extensions on top of the core. This mostly means creating a core toolset so anyone can create a schema, translate data into the schema, and validate any data against both that schema and its extensions.
The cool thing is that I think STAC has defined a really good way to do this, that just needs to be generalized and enhanced a bit.
So STAC is ready to use if you want a data schema for data where the geometry is an indicator of the footprint of some other type of data, and there are links to the actual data. And then you can tap into all sorts of STAC extensions that help define additional parts of a flexible data schema. But if your data is like the vast majority of vector data, where the geometry and properties are the data, not metadata about some other data, then you can’t tap into all the great extensions and validation tools.
To generalize what STAC does, I believe we can build a construct that lets any type of vector data define a core JSON Schema and links to extensions. With STAC, you just look for stac_version and then you know that it can validate against the core STAC extension, and then stac_extensions is a list of links to the JSON schemas of the extensions it implements. The links are naturally versioned, as part of the URL of the schemas.
A general version could just have a definition that links to a single core schema (validating the geometry and any other attributes that are considered ‘core’). And then an extensions list that works the exact same way as STAC extensions. It perhaps could even directly use some STAC extension definitions like the MGRS extension, which could be used by any number of vector datasets that want to include MGRS:
Close readers will likely note that this approach would all unfortunately be incompatible with STAC, since STAC has hard coded versions. But I think if a wider ecosystem takes off in a big way we could consider a STAC 2.0 that fits properly into the hierarchy. And there’s probably some less elegant hacks you could do to make it all work together if that was needed.
The other bit that would make a ton of sense to generalize is the STAC extension repository template. This to me is one of the most clever parts of the STAC ecosystem, and it’s all thanks to Matthias Mohr.
The core is it gives you a clear set of guidelines to fill out your own extension. You don’t need to check 3 other extensions to see how they do it, you just change the right places for yours and it then ‘fits’ with the ecosystem. But it goes far beyond that, as it clones a set of continuous integration tools. It will automatically check your markdown formatting, and once you finish your JSON Schema it will also check that all your examples conform to STAC and your defined extension.
And then when you publish a release it will automatically publish the JSON Schema in your repo on github pages, to be the official link. STAC validators can then immediately make use of it. So to create any new version of your extension you just need to cut a release. I hadn’t even known that anything like this was possible, but it made it such a breeze to create a new extension. You can focus on your data model, and not on how to release it and integrate into tooling, since it all ‘just works.’
Astute readers will realize that the recently announced fiboa project has explored a number of these ideas. It is a vector dataset focused on field boundaries, and it defined a small core and flexible extensions; Matthias adapted the STAC extension template concepts to a fiboa extension template. We have yet to go all the way to a ‘definition’ schema defined by a link to the core schema, as it felt like too much complexity for the first version, but if others start to do similar things we could do so by 1.0.
So I think there’s a few ways we’d ideally go beyond just generalizing how STAC does things. The first one is to be compatible with GeoParquet. GeoParquet is a much more naturally default format for vector data on the cloud than GeoJSON is, and its support of different projections also will help support a wider variety of use cases. GeoJSON worked well for STAC, particularly because we had both the static STAC and the STAC API options. I originally imagined that large data sets would naturally be stored in a database and use a server that clients would query. But the fully cloud native approach has been quite appealing, and a number of very large datasets are just on object stores, and consist of millions of individual JSON files (next to the actual data files).
We’ve recently started to standardize on how to represent a full STAC collection in GeoParquet with the STAC GeoParquet Spec. I’ll hold off on a deep dive on that, but it’s pretty cool to be able to just query the entire STAC catalog without needing an API.
So for non-’asset’ data, where the geometries and properties are the data, not metadata, GeoParquet makes much more sense than GeoJSON as the main distribution format. But my hunch is that it likely will still make sense to define data schemas in more human readable formats. There is some argument for defining the core schemas completely abstractly, in something like UML, since formats will continue to change and we should be adaptable to that. But from my experience that introduces an unnecessary layer of abstraction. With STAC I actually started a SpatioTemporal Asset Metadata (STAM) spec, see https://github.com/radiantearth/stam-spec, to try to make abstract definitions that could map to JSON but also other formats (like GeoParquet though it didn’t exist yet, or as Tiff tags in a GeoTIFF). But it was a pain to try to maintain both and just didn’t add much.
For fiboa, Matthias defined something a bit less abstract than UML, defining a human-readable YAML-based language to describe the attributes and constraints and named fiboa Schema for now. Pure JSON schema didn’t quite work, since JSON has a limited number of types, so it couldn’t precisely describe the different data types commonly found in file formats such as GeoPackage and GeoParquet. Our intention is also that users create extensions, which need a separate schema definition for the added properties. As such the language is much simpler to lower the entry barrier for newcomer as we found in STAC and other projects that JSON Schema is too difficult for many. Nevertheless, the language is based on JSON Schema so it can easily be converted to valid JSON Schema. We’re not sure if it’s the right answer for all time, but it is working pretty well as a way to easily define schema information and have it validate in both JSON Schema and Parquet.
Overture is also using JSON Schema to define their schemas, so it’s nice they’re thinking in similar directions. Nevertheless, they’ve also identified the data type limitations that we’ve seen. For example, temporal information are stored in strings instead of native temporal data types and numerical types always use the biggest container (e.g. int32 instead of uint8). They say the final format for Overture deliveries hasn’t been set yet, but the most recent releases were GeoParquet. They also recently embraced snake_case for their naming, which is a minor detail, but does make it easier to potentially share extensions between the projects. My hope if that we can get to some common tools that let projects define schemas for vector data with properties across various file formats such as GeoParquet in human readable formats and get automatic validation tools. We are also experimenting and discussing with other related projects such as Overture to get feedback on our approach to make it general enough for other use cases.
One final idea to generalize and hopefully really enhance is STAC Index. STAC Index provides a list of all public STAC Catalogs, and you can use a STAC Browser to easily browse the full extent of any of them. One of the original ideas of STAC Index was to crawl all the catalogs and provide lots of interesting stats on them. Tim Schaub built a crawler to do this and reported on the results in the State of STAC blog post, but ideally it would be a continuous crawling and reporting of stats.
I believe the key to a Cloud-Native Spatial Data Infrastructure is to make it really easy to measure how successful adoption has been. Not just counting the total of number of GeoParquet datasets, but to to track more details to be able to get some real nuance. Things like number of rows in GeoParquet, stats by particular data schemas (like fiboa), global spatial data coverage by data type, etc. There is a chance we’ll still use STAC, but likely just at the ‘collection’ level, to provide metadata on GeoParquet files. Though that’s something we’re still figuring out in fiboa — if we should adopt STAC Collection directly or just aim to be compatible with it — the latter feels a bit simpler.
So STAC Index should be enhanced to really be the ‘Cloud-Native SDI Index’, and serve as a clear KPI and scoreboard to really measure adoption. I should probably spend a whole blog post on the topic of measuring standard adoption at some point, as I have a suspicion that making it really easy to measure the actual adoption is a powerful lever to drive adoption.
So for me the next major project is to work on flexible schemas for foundational data sets. One obvious one is buildings, and Overture is doing incredible work there. They’re truly crushing it on figuring out a core, flexible data schema with global ideas, but I think it’d be awesome to build on what they’re doing in two ways. The first is to try to build validation tooling, and experiment with making other building datasets ‘overture compatible’. And the second would be to try out ‘extending’ their core schema with some additional fields. Perhaps something like building color, or roof type. I suppose the ideal would be to find some better attributed building dataset and make it Overture compatible — try to conflate the geometries, merge the common attributes, and make a schema for the extended attributes.
And then in fiboa we’re taking a serious run at defining flexible schemas for field boundaries and ag related data. If you’ve got interest in that then please join us! And if you’re interested in a different domain than buildings and field boundaries then don’t hesitate to take these ideas and run with them. And I just talked to a group about doing the same approach for forest data, so if you’re interested in collaborating on that let me know.We don’t have any explicit channel on the Cloud Native Geo slack, but I’m sure we can start on #geoparquet or #general and then spin one up.
Awhile ago on twitter I read this cool post on sub-national GDP.
This does seem like another opportunity — use Overture locality schema for sub-national boundaries at the core but add an attribute for GDP and other economic stats, and harmonize all the datasets listed.
I’d also love to hear of other opportunities for collaborations on data schemas, and other success stories where there are interoperable data standards, particular in domains that are further afield. I think it’d be interesting to try to adapt some of the successful ones into cloud-native formats, just to see if it works and if it adds value. So if anyone wants to work on that don’t hesitate to get in touch.
The notion of physical space has long been central in geographical theories. However, the widespread adoption of information and communication technologies (ICTs) has freed human dynamics from purely physical to also relational and cyber spaces. While researchers increasingly recognize such shifts, rarely have studies examined how the information propagates in these hybrid spaces (i.e., physical, relational, and cyber). By exploring the vaccine opinion dynamics through agent-based modeling, this study is the first that combines all hybrid spaces and explores their distinct impacts on human dynamics from an individual’s perspective. Our model captures the temporal dynamics of vaccination progress with small errors (MAE=2.45). Our results suggest that all hybrid spaces are indispensable in vaccination decision making. However, in our model, most of the agents tend to give more emphasis to the information that is spread in the physical instead of other hybrid spaces. Our study not only sheds light on human dynamics research but also offers a new lens to identifying vaccinated individuals which has long been challenging in disease-spread models. Furthermore, our study also provides responses for practitioners to develop vaccination outreach policies and plan for future outbreaks.
Keywords: Agent-based modeling, hybrid space, opinion dynamics, Covid-19, vaccination.
![]() |
Flowchart of the modeling process. |
![]() |
Comparing predicted and observed vaccination rates among different age groups by using the weight combination 3 (physical), 1 (relational), 1 (cyber) for hybrid spaces. |
![]() |
Comparing predicted and observed vaccination rates by varying weights of hybrid spaces for different age groups. |
![]() |
Spatial distribution of Covid-19 vaccines. (a)-(d) Point density of vaccination allocation at different time steps. (e) Predicted vaccination rates at census block group level. |
Yin, F., Crooks, A.T. and Yin, L. (2024), How information propagation in hybrid spaces affects decision-making: using ABM to simulate Covid-19 vaccine uptake, International Journal of Geographical Information Science, https://doi.org/10.1080/13658816.2024.2333930 (pdf)
Back to entry 1
Scanxiety.
This is where I am right now. Scanxiety.
Each stage of the cancer experience is marked by a particular set of tests, of scans.
I actually managed to get through my first set of scans surprisingly calmly. After getting diagnosed (“there’s some cancer in you”), they send you for “staging”, which is an MRI and CT scan.
These scans both involve large, Star Trek seeming machines, which make amazing noises, and in the case of the CT machine I was put through was decorated with colorful LED lights by the manufacturer (because it didn’t look whizzy enough to start with?).
I kind of internalized the initial “broad-brush” staging my GI gave me, which was that it was a tumor caught early so I would be early stage, so I didn’t worry. And it turned out, that was a good thing, since the scans didn’t contradict that story, and I didn’t worry.
The CT scan, though, did turn up a spot on my hip bone. “Oh, that might be a bone cancer, but it’s probably not.” Might be a bone cancer?!?!?
How do you figure out if you have “a bone cancer, but it’s probably not”? Another cool scan, a nuclear scan, involving being injected with radioactive dye (frankly, the coolest scan I have had so far) and run through another futuristic machine.
This time, I really sweated out the week between the scan being done and the radiology coming back. And… not bone cancer, as predicted. But a really tense week.
And now I’m in another of those periods. The result of my major surgery is twofold: the piece of me that hosted my original tumor is now no longer inside of me; and, the lymph nodes surrounding that piece are also outside of me.
They are both in the hands of a pathologist, who is going to tell me if there is cancer in the lymph nodes, and thus if I need even more super unpleasant attention from the medical system in the form of several courses of chemotherapy.
The potential long term side effects of the chemotherapy drugs used for colorectal cancers include permanent “peripheral neuropathy”, AKA numbness in the fingers and toes. Which could put a real crimp in my climbing and piano hobbies.
So as we get closer to getting that report, I am experiencing more and more scanxiety.
If I escape chemo, I will instead join the cohort of “no evidence of disease” (NED) patients. Not quite cured, but on a regular diet of blood work, scans, and colonoscopy, each one of which will involve another trip to scanxiety town. Because “it has come back” starts as a pretty decent probability, and takes several years to diminish to something safely unlikely.
Yet another way that cancer is a psychological experience as well as a physical one.
Talk to you again soon, inshalla.
Back to entry 1
I have a profoundly embarassing cancer. Say it with me “rectal cancer”. “Rectal cancer”.
Why is it embarassing?
Poop!?! Maybe we are all still six, somewhere deep inside.
When Ryan Reynolds got a colonoscopy on camera, to raise awareness of colorectal cancer screening, part of the frisson of the whole thing was that yes, somehow having this procedure done is really embarassing.
So, watch the video, it’s really nothing but an ordinary medical procedure that could very well save your life. And Ryan Reynolds is charming.
Meanwhile, colo-rectal cancers remain tough to talk about, because frankly the colonoscopy is the least of it.
Not having control of your bowels is, well, really embarassing in our culture. What do people say about elderly presidential candidates they hate? They call them incontinent. They intimate that they wear adult diapers (gasp!).
Do you know who else gets to wear adult diapers? Colorectal cancer patients. We get our insides man-handled, irradiated and chopped up, and the results are not great for bowel control. It happens if you’re 55, it happens if you’re 35. It’s normal, it’s usually temporary, it’s what happens when you insult a GI tract badly enough.
Another rite of passage in treatment is the ostomy. Stage III rectal cancer treatment usually involves a temporary ostomy, after radio-chemotherapy during the resection of the part of the rectum that holds the tumor. Patients with a low (near the anus) tumor location will sometimes require a permanent ostomy, because the tumor cannot be removed without damaging the anus.
When I was diagnosed, I was initially terrified of the ostomy. “The bag.”
After researching the different treatments, I got a lot less terrified, since the side effects of some non-bag outcomes in terms of quality of life can be pretty terrible. Meanwhile folks with ostomies are out hiking, biking, and swimming.
If this talk is all a little uncomfortable, may I recommend a colonoscopy?
And after that, a big meal and some poooooping! Poop! Poop! Poop!
I’m in a pooping mood because my surgery (2 weeks ago now) has left me, not incontinent, but I guess “disordered” is a better word. You know how it feels to really need to take a dump? Imagine feeling that 12 hours a day, even when you don’t actually have anything to dump.
By most measures I think I am ahead of the median patient in recovery from LAR surgery, but unfortunately the recovery time for things like bowel regularity and “normalcy” (the “new normal” will always be somewhat worse than the “old normal”) is measured in months, not days, so I am a little impatient to improve more, and faster.
Talk to you again soon, inshalla.
Back to entry 1
“Anything that’s human is mentionable, and anything that is mentionable can be more manageable. When we can talk about our feelings, they become less overwhelming, less upsetting, and less scary. The people we trust with that important talk can help us know that we are not alone.”
– Fred Rogers
When I found out I had rectal cancer, I hit the internet hard and immediately found the ColonTown community of communities. It has turned out to be simultaneously the most reassuring and the most anxiety producing place on the internet for me.
On the reassuring side, even though colorectal cancer is third most prevalent cancer world-wide, it is not widely talked about, so the community was reassuring: there are other people out there going through this; many are already through it.
There are also a lot of aspects of the rectal cancer treatment process that the medical system seems ill-equiped to support. I was sent home from major surgery with very little guidance in hand about expected recovery progression, or diet. Fortunately the community was a great resource for this information.
On the anxiety producing side, let me start with this meme.
The population of a community like ColonTown is necessarily going to bias towards people who are currently in treatment and the population of people currently in treatment will bias toward folks whose treatment is not necessarily getting them cured.
Survivorship bias in this case manifests in the survivors slowly draining out of the community, leaving behind mostly folks still actively in treatment.
There are a lot of people in Stage IV, keeping fucking going, who have harrowing tales. And there are people who were Stage II (like me) who against the odds progressed to Stage IV. It happens. And the thought that I could follow that path too, is frankly terrifying.
Anyways, as a result, reading posts from this wonderful supportive online community can sometimes throw me into a spiral of anxiety, because oh my god I have this terrible deadly thing. Which I do. But also probably I am going to be fine.
Probably?
Talk to you again soon, inshalla.
fiboa is a new collaborative project to improve farm field boundary data interoperability and other associated agriculture data, that we introduced a couple of weeks ago. This post complements our previous deep dive into the follow-up post core specification and its extensions. In that post, we mentioned that fiboa is not just a specification; it’s a complete system. It includes the entire ecosystem of data adhering to the specification, the discussions and conversations that evolve the specs, and of course, the community people who are building it all together. In this post, we introduce the initial tools, data, and community that form that ecosystem.
The goal of fiboa isn’t to create a data schema - the schema is a means to get at the goal of more data and more open data about field boundaries and agriculture to help us make better decisions. And indeed the best way to make a good data schema is not to go in a room and try to create the most perfect ontology - it’s to actually work with data and evolve the core specification and extensions to better represent real-world data. Our next step is to work with a number of organizations to ensure that their data can be represented in fiboa. To jump-start this process, we’ve converted a number of existing datasets and made them available on Source Cooperative.
For those unfamiliar with Source, it’s a data hosting utility provided by Radiant Earth, built with great support for cloud-native geospatial formats. We’ll likely put most public datasets up on Source, as it’s a user-friendly platform. Anyone else is also welcome to host their fiboa-compliant data there. You can also easily host them on any cloud or simply use them locally, but it can be quite beneficial to put larger data up there, as it becomes easy for users to download just the subset of data they want.
The first of the datasets that was converted was Field Boundaries for North Rhine-Westphalia (NRW), Germany. It was followed by 1.3 million field boundaries for Austria, plus boundaries for Berlin / Brandenburg, Lower Saxony, and Schleswig-Holstein in Germany. It’s pretty easy to convert existing datasets (more details in the ‘tools’ section below), so if you’re interested in contributing to fiboa, then converting and uploading a new fiboa dataset on Source is a great way to start. There are a few potential datasets listed in the fiboa data repository tracker; if you’ve got ideas of other great datasets to contribute, don’t hesitate to add them to the tracker. We’re also hoping to get several commercial companies to contribute at least samples of their data implementing fiboa. The academic work that Taylor Geospatial Engine is funding is also going to harmonize and make publicly available some interesting datasets.
As part of the initial release of fiboa, Matthias Mohr has built several tools to make the ecosystem more immediately useful. The main tools are all available from the fiboa command-line interface (CLI). This can be easily installed by running pip install fiboa-cli on any command line with Python 3.9 or above installed. It works like any command-line tool, and you can just type ‘fiboa’ on your command line and you can explore from there:
The most important command is likely validate
, which lets you check any GeoJSON and GeoParquet file to confirm whether it is a valid fiboa file. This operation is key to ensuring interoperability. It does no good to just have people ‘try’ to implement the specification with no way to ensure that they are doing it correctly. Validation will ideally be written into any workflow with fiboa data, to ensure all tools can count on it being represented properly.
All the validation is completely dynamic. fiboa files themselves point at the versions of core and extensions that they declare themselves as supporting, which means if there is a new release then they can immediately point at the new release file’s location and the validators will check against the latest. This means that there does not need to be a new release of the validators for each new extension release since it automatically follows where the file points. The validator works against local files as well as remote files.
The ‘describe’ tool is a favorite of mine, to quickly get a sense of the data.
The fiboa ‘create’ tool is also quite useful, as it can take a GeoJSON file and the intended schemas and transform them into the GeoParquet version.
And then there are a bunch of utilities to help with creating fiboa files and metadata. They include create-geojson
which makes a fiboa GeoJSON from a fiboa GeoParquet, create-geoparquet
which does the opposite, and fiboa jsonschema
which will write out the valid JSON Schema for a given fiboa file.
Matthias has put together a few nice tutorials, and all of these fiboa tools are covered in the ‘CLI Basics tutorial’. You can read the text version of it, or watch the video. The tutorial also covers this great Jupyter notebook that demonstrates how to do some analysis of fiboa data.
Video tutorial on the fiboa command-line interface on YouTube.
There is also a new converter tool, which can take non-fiboa data and help you turn it into fiboa data. Each converter must be implemented as part of the CLI library, but once it’s there then it’s available for anyone to convert any ‘official’ data to the fiboa version. Currently, there are only converters for open datasets in Germany and Austria, but it is relatively easy to add one. Doing so will make it simple for any user of agricultural datasets to convert for other regions. Matthis put together a great tutorial on how to easily create a new converter using the templates. A text version and video are available for this process as well. If you do create a new converter, please contribute it to the project so others can also use it.
While there’s a great start to the ecosystem above, we’re still in the early days. The key now is building an amazing community that can make this effort far bigger than we’ve dreamed of. We did the initial workshop in person to build up trust and connection as humans in an initial group, but our next goal is for the community to be centered online, in the style of the best open communities that we all know about. We hope to do more in-person events, but as a way to enhance the primary online collaborations. It will take some time to transition, and this is where we could use help! Mostly by joining in the effort, especially if you weren’t in the original workshop. We will not see this effort as successful until there are more people contributing than there were at the first workshop.
We’ve just recently started forming the rituals and communications that form the community. The center will certainly be the fiboa organization on GitHub. This is where the core specification, most of the extensions, the core tools, and the discussion forum live. It’s also where we’re doing all the project management of all the different pieces. We aim to add tags in our projects on ideas for good first tasks to help beginner contributors find easy ways to get involved.
We do think some real-time discussion can really move things forward. We have bi-weekly Zoom meetings for progress checks (see project board). Yet we strive to follow best practices for online communities, aiming to make all decisions fully online in the repositories, and posting everything that happens, so that the Zoom meetings are a complement to the core running of the project and not how the project is run. We’ll also aim for some ‘break out’ sessions to enable higher bandwidth collaboration on key topics like defining new extensions, delving into core spec questions, or giving deeper demo sessions. Anyone and everyone is welcome to join both the bi-weekly calls and any break-out sessions. You can get these calls added to your Google calendar by joining the fiboa Google Group. Join the #fiboa Slack channel on the Cloud Native Geospatial Slack for async / chat communication.
There are many ways to contribute, some mentioned above. If you’d like to learn more (even if you aren’t ready to contribute), join our Slack or jump into the bi-weekly meetings. All are welcome to just join and observe. And if you just want to ‘do something’ then the best way is to actually try to take an existing field boundary dataset and try to convert it to fiboa, and uploading it to Source Cooperative. Matthias’s tutorials should guide you. If any questions remain, please feel encouraged to ask them on the fiboa Slack channel. We are also planning to publish more information, documentation, and tutorials in the future. But until then, just jump in and get in touch.
We look forward to working with you, and building this project together!
Back to entry 1
I’m still here.
There’s nothing like spending some time in hospital to get a visceral reminder that “well, things could be a whole lot worse”. There are plenty of people dealing with far more dire scenarios than a little surgical recovery with a discharge in a handful of days.
My stay was scheduled for 3-5 days, and I was discharged in 4, a testament to my good health going in and the skill of my surgeon in doing the least harm while still doing what needed to be done.
It was still a long and eye-opening four days.
Psychologically, the worst time was the 24 hours before they put me under. Fasting, and hard antibiotics, and bowel prep, and anticipation, and an early start. I shed some tears in the pre-op while waiting to roll into the OR, for sure.
The four days of recovery in hospital included all sorts of new indignities, from catheterization to shitting the bed, from adult diapers to the generalized humiliation of being unable to move, pinned down by gravity and pain. Good personal growth moments. This is staying alive, in all its messiness, a process of continuous compromise and self-adjustment.
Now I have two recoveries to work on.
The near term one is healing from the surgery. They put a breathing tube down my throat, catheterized me, cut six little port-holes into my abdomen to stuff laproscopic tools through, inflated my abdominal cavity so they could see what they were doing, cut out the majority of my rectum and the surrounding lymph nodes, stapled the sigmoid colon to whatever rectum was left, inflated my bowel to test that joint, and then closed me back up again. My middle is in rough shape. But it should all recover to more-or-less its previous strength, over several weeks.
The longer term one is how my GI system adjusts to missing several critical centimeters at the end. This is where permanent changes loom, the bits that worry me. How long I can go between trips to the bathroom and how much control I have over things when I need to, are recovery processes that will play out over several months.
So far, for someone in their first week of recovery, I think I am doing well.
I am trying not to worry too much about the whole cancer part of this journey, which is still in the grey area until we get pathology back on the parts they took out of me. That will determine whether I am due for several months of post-operative chemotherapy, or move directly to monitoring.
Talk to you again soon, inshalla.
Over the last few decades, considerable efforts have been placed in creating digital virtual worlds. Ranging in applications from engineering, geography, industry, and translation. More recently, with the growth of computational resources and the explosion of spatial data sources (e.g., satellite imagery, aerial photos, and 3-dimensional urban data), creating detailed virtual urban environments or urban digital twins has become more widespread. However, these works emphasize on the physical infrastructure and built environment of the urban areas instead of considering the key element acting within the urban system, which are the humans. In this paper, we would like to remedy this by introducing a framework that utilizes agent-based modeling to add humans to such urban digital twins. Specifically, this framework consists of two major components: 1) synthetic population datasets generated with 2020 Census Data; and 2) pipeline of using the population datasets for agent-based modeling applications. To demonstrate the utility of this framework, we have chosen representative applications that showcase how digital twins can be created for study various urban phenomena. These include building evacuations, traffic congestion and disease transmission. By doing so, we believe this framework will benefit any modeler wishing to build an urban digital twin to explore complex urban issues with realistic populations.Keywords: agent-based model, geosimulation, urban digital twins
The effect of the recent COVID pandemic has been significantly curtailed with the introduction of vaccinations. However, not everyone has been vaccinated for a multitude of reasons. For example, people might be influenced by what they read online or the opinions of others. To explore the changes in people’s views on vaccination, we have developed a geographically explicit agent-based model utilizing opinion dynamics. The model captures people’s opinions on COVID vaccination and how this relates to actual vaccination trends. Using the entire state of New York with a population of over 22 million agents, we model vaccination uptake from January 1, 2021, until May 15, 2022. Agents within the model synthesize information from the other agents they are connected with either in physical or cyberspace and decide whether to vaccinate or not. We compare these vaccination statuses among different age groups with actual vaccination rates provided by New York State. Our results suggest that there is an interplay between different spaces and ages when it comes to agents making a decision to vaccinate or not. As such the model offers a novel way to explore vacation decisions from the bottom up.
Keywords: Agent-based modeling, Covid, Vaccine, Geosimulation, Social Networks
Smells can shape people’s perceptions of urban spaces, influencing how individuals relate themselves to the environment both physically and emotionally. Although the urban environment has long been conceived as a multisensory experience, research has mainly focused on the visual dimension, leaving smell largely understudied. This paper aims to construct a flexible and efficient bottom-up framework for capturing and classifying perceived urban smells from individuals based on geosocial media data. Thus, increasing our understanding of this relatively neglected sensory dimension in urban studies. We take New York City as a case study and decode perceived smells by teasing out specific smell-related indicator words through text mining and network analysis techniques from a historical set of geosocial media data (i.e., Twitter). The dataset consists of over 56 million data points sent by more than 3.2 million users. The results demonstrate that this approach, which combines quantitative analysis with qualitative insights, can not only reveal “hidden” places with clear spatial smell patterns, but also capture elusive smells that may otherwise be overlooked. By making perceived smells measurable and visible, we can gain a more nuanced understanding of smellscapes and people’s sensory experiences within the urban environment. Overall, we hope our study opens up new possibilities for understanding urban spaces through an olfactory lens and, more broadly, multi-sensory urban experience research.Keywords: Smellscape, Urban smells, Geosocial media, Text mining, Network analysis, Multi-sensory urban experiences.
Last but not least, Boyu Wang presented his work entitled "Simulating urban flows with geographically explicit synthetic populations". In this talk, Boyu showed how a deep learning spatial-temporal urban flow model is trained to predict the aggregated inflows and outflows within regions and feed directly into an agent-based model.
Abstract
Urban human mobility is an active research field that studies movement patterns in urban areas at both the individual and aggregated population levels. Through individual’s movement, higher level phenomena such as traffic congestion and disease outbreaks emerge. Understanding how and why people move around a city plays an important role in urban planning, traffic control, and public health. An abundance of agent-based models have been built by researchers to simulate human movements in cities and are often integrated with a GIS component to realistically represent the study area. In this work we build a geographically explicit agent-based model where agents move between their home and workplaces, to simulate people’s daily commuting patterns within a city. In order to build this model, we develop a geographically explicit synthetic population based on census data. A deep learning spatial-temporal urban flow model is trained to predict the aggregated inflows and outflows within regions of the study area, which are subsequently used to drive individual agents’ movements. To validate results from the agent-based model, agents’ movements are aggregated and evaluated along with the urban flow model. Commuting statistics are also collected and compared to existing travel surveys. As such we aim to demonstrate how urban simulation models can be complemented by recent advancements in GeoAI techniques. Conversely, the aggregated deep learning model predictions can be investigated at a fine-grained individual level. This extends traffic patterns forecasting from just looking at the patterns to the processes that lead to these patterns emerging.Keywords: Agent-Based Modeling, Urban Flow, GeoAI, Urban Simulation, Synthetic Populations
References:
Yin, F., Jiang., N. and Crooks, A.T. (2024), Modeling Covid Vaccination uptake in New York State: An Agent-based Modeling Perspective, The Association of American Geographers (AAG) Annual Meeting, 23rd –27th April, Honolulu, HI. (pdf)Jiang., N. Crooks, A.T., Wang, B. and Yin (2024), Populating Digital Twins with Humans: A Framework Utilizing Artificial Agents, The Association of American Geographers (AAG) Annual Meeting, 23rd –27th April, Honolulu, HI. (pdf)Chen, C., Poorthuis, A. and Crooks, A.T. (2024), Mapping the Invisible: Decoding Perceived Urban Smells through Geosocial Media in New York City, The Association of American Geographers (AAG) Annual Meeting, 23rd –27th April, Honolulu, HI. (pdf)Wang, B. and Crooks, A.T. (2024), Simulating Urban Flows with Geographically Explicit Synthetic Populations, The Association of American Geographers (AAG) Annual Meeting, 23rd –27th April, Honolulu, HI. (pdf)
Medicine is a vast industry that is evolving every day. Like any evolving industry, the medicine industry employs LiDAR technology to ease its operations. The spatial data collected by the LiDAR tools are extensively used in medical analysis and other pharmaceutical tasks. In this article, we shall discuss some of the typical applications of […]
The post Applications of LiDAR in Medicine first appeared on Grind GIS-GIS and Remote Sensing Blogs, Articles, Tutorials.Last week, we introduced fiboa, a collaborative project with the Taylor Geospatial Engine (TGE) designed to standardize farm field boundary data and bootstrap an ‘architecture of participation’ around agricultural and related data. The center of fiboa is a specification for representing field boundary data in GeoJSON & GeoParquet in a standard way, with optional ‘extensions’ that specify additional attributes. But we believe that thinking of fiboa as ‘just’ a specification is outdated. fiboa is the entire ecosystem of data adhering to the specification, tools to help convert data (including using AI models to in turn create more data), the discussions and conversations that evolve the specs, and of course the community people who are building it all together.
This blog post dives into the heart of fiboa: the core specification and its extensions. We’ll explore the core attributes that define this format and how the extensions enable interoperability of all types of information that can be associated with a field boundary. You should start with the Introducing fiboa post if you’ve not read it, as it articulates the overall philosophy behind the project. This post goes deep into the specification & extensions, and then we’ll follow up with the current state of tools, data, and community.
The core of fiboa is quite simple: it is a set of definitions for attribute names and values. One clear example is area
. It’s quite common for geospatial files representing field boundaries to have a column for the area of the field, but it’s often called different things: area
, area_ha
, totalArea
, etc. And even if it’s called the same thing the actual data definition could be different: area could easily be in acres or hectares, or even something else. So what fiboa does is picks a definition; in our case area is in hectares and must be a ‘float’ between 0 and 100,000. Any data that implements fiboa and successfully validates can then be definitively interpreted as being in hectares.
fiboa Core Spec, at github.com/fiboa/specification/tree/main/core
fiboa specifies the attributes in a human-readable form as shown above, along with a machine-readable yaml file. Then there are folders for GeoParquet and GeoJSON outputs that contain official examples and specs. This means that there are validation tools that can take any data in those formats and report whether it properly implements the core fiboa data schema (along with any extensions - more on those soon).
Validation of Field Boundaries for North Rhine-Westphalia (NRW), Germany using the fiboa CLI
GeoJSON generally works best for things like API responses or transferring small amounts of data. GeoParquet shines when storing or moving any sizable amount of data since it is a much faster and more compact format. GeoParquet is a newer format and there is not yet universal tool support, but a big benefit is it can be stored on the cloud and clients can easily stream just the bits they need. Major data projects like Overture Maps are supporting it and the ecosystem is growing fast, so we decided to embrace it as we envision all global fields represented in fiboa, and billions of polygons are much better served by a more modern cloud-native geospatial format. For the TGE Field Boundary Initiative, we’re using Source Cooperative as our primary data infrastructure, and it works great with GeoParquet.
Thankfully it is quite easy to transform any fiboa data into other geospatial formats like GeoPackage or Flatgeobuf, as the attribute names and values will be retained. We do not recommend using Shapefile due to various technical limitations of the format. Unofficial validators may emerge for those, or we could consider officially supporting them - we just wanted to start with a small core.
The number of attributes in the core is quite small, and that’s by design. The idea is that most all the ‘interesting’ data about the field will be in extensions. So even something that many people would consider a core property like ‘crop classification’ will go in extensions. This is so that the definitions can evolve more easily, and so we don’t have people who don’t adopt fiboa because they have their own crop classification system that works better for their use case. The extensions give the possibility of several crop classification extensions. Practically we do hope that one main crop classification extension emerges, and that will likely happen if the largest, most valuable datasets all use the same extension. But we don’t believe the small group of people involved at the start can get all the core attributes completely right from the start. Indeed we don’t even believe that there is one ‘true’ answer to the right data schema for agricultural data. So our approach is to create the tools for everyone to define what they need and to then validate against their own extensions. Naturally, some frequently used ‘core extensions’ will emerge. Much of the inspiration for this comes from the STAC specification. For STAC, several well-used extensions have emerged, and therefore we expect the same for fiboa.
As of right now, the only required attributes in the core fiboa specification are id
and geometry
. Then we have optional attributes for spatial properties (bbox
, area
, and perimeter
), and a couple of properties about the creation (determination_method
and determination_datetime
).
The method used to create the boundary is quite important, particularly in AI use cases where you would not want to train an AI on data that itself was auto-created by AI from other imagery. The determination_datetime
was the subject of much debate, and we did talk through various datetimes that people care about a lot but decided that it’d be best to cover all the various datetime options in an extension that is explicit about what each time means. We did want to have one date time in the core and coalesced on the determination_datetime
, which is the last time at which this particular field was observed, so you can tell if a field is up to date or created a while ago. We thought about making the datetime attribute required but did not want to end up with a bunch of bad time data as many datasets don’t have precise time information and they’d just end up dumping ‘something’ in there.
The core will likely evolve a good bit, and feedback on these decisions is more than welcome, as this release mostly aims to start the conversation. When we feel it’s more settled we’ll likely call it ‘beta’, but there are still some big things to figure out, like what should be at the ‘collection’ level and what to do when different collections of data are merged.
I touched on the philosophy behind extensions above, and it’s hopefully clear that extensions in fiboa aren’t just extraneous information that doesn’t really matter. The bulk of fiboa’s value will be in extensions. There will likely be lots of different types of extensions: some that are generally accepted as the main way to do things in fiboa and widely understood by tools and others that are very niche and not widely used but valuable to a small number of users (e.g. an extension specific to a company or organization to help them better validate their data).
Implementing an extension enables the dataset to make use of the ecosystem of fiboa tools, including validations to ensure that the values in a dataset meet the requirements of an extension. This, in turn, lets tools ‘know’ that a particular value in two different datasets means the exact same thing and can be combined. This should lead to much more innovation in tools to work with the data since tool providers don’t need to code against particular datasets or try to get everyone to convert their data into a random schema for the tool to work. Everyone can work towards a common target, creating a virtuous cycle where converting agricultural data to fiboa makes it work with more tools, more tools get created because there’s more data in fiboa, and then more data gets converted because there are even more tools that come from converting data.
We put in quite a bit of work to make the process of creating extensions as easy as possible (and by ‘we’ I mean Matthias Mohr did a ton of awesome work, funded by TGE). Each extension is defined by a GitHub repository that contains all the information about the extension and publishes the schemas that tools directly call for validation. The cool thing is that Matthias created a ‘template’ where all you need to do is hit ‘use template’ and you’re 80% of the way to making an extension.
You edit the readme, customize a YAML template, and create an example GeoJSON file. Then, the repo template provides all the tools to convert it to GeoParquet, validate everything, and run continuous integration to ensure each new commit remains valid. When you hit ‘release’ for the repo, the schema is automatically published and any of the ecosystem tools can instantly start validating datasets against it. This was awesome to see in action at the initial workshop, as I was able to fully release a new extension and then validate data against it using the command-line tool in just a couple of hours.
Matthias has built a tutorial on creating a new extension, for anyone interested in trying to create one. You can find it at github.com/fiboa/tutorials/tree/main/create-extension, and it has a link to the video recording there as well.
So far we’ve only managed to create four extensions, but we aim for many more over the next few months. The one that saw a lot of effort during the workshop was the AI Ecosystem Extension.
It includes everything you’d want in a dataset to be able to reliable machine learning from it, including things like the author, the quality, the confidence, and whether it was machine-generated. In time some of these things may migrate to more general extensions, but the idea was to first get down everything that an AI/ML tool would need to be able to use the fiboa dataset and run models with it. There was also work done in the workshop so GeoTorch could easily read in the data.
The other extension made in the workshop is the Tillage Extension. I worked with Jason Riopel of Bayer and Katie Murphy of the Donald Danforth Plant Science Center on this one. We started it as a ‘management practices’ extension but quickly realized that there were several attributes about tillage that people would use, so we broke it out into its own extension. I knew close to nothing about tillage, so it was cool to have them brainstorm what people want to know, what to call the attributes, and how to explain them. The extension is by no means ‘done’, but we managed to put out a 0.1 release. The next step will be to try to get one or more ‘real’ datasets converted, to validate that it works with existing data.
The INSPIRE and FLIK extensions define specific identifiers that are commonly used in the EU and Germany. Those two were added to support some of the first datasets converted to fiboa and hosted on Source Cooperative. I think that will be one of the main ways extensions get developed - start with a dataset to be converted and look at the attributes that aren’t already in fiboa. And then figure out if some of them are really ‘common’ ones that many datasets would want to represent. It’s worth doing some looking at other datasets, but since making an extension is so easy it’s also great to just create the extension and then solicit feedback as others hit similar problems, and evolve it in the open.
While each extension gets an individual repository, we also have an overall Extensions Repository at github.com/fiboa/extensions that lists all available extensions. The issues in the extensions repository serve as a tracker for potential new extensions. Right now we’ve only got three extensions listed as priorities, but the workshop generated many more ideas that we’ll post to the tracker soon. Some examples include yield, crop classification, soil moisture, phenology, irrigation, soil information, climate risk, harvest dates, deforestation, ownership information, surface temperature, etc.
Two proposed extensions that are pretty critical to get to soon are Identifiers and Timestamps. For IDs, we are particularly interested in things like Varda’s Global FieldID being represented easily, to help promote their awesome work. We know that many other ID schemes are important to people, so creating an extension to allow for those to exist within fiboa is a priority. For Timestamps, we anticipate a deep discussion of all the different types of time that people care about with fields. We started talking about it at the workshop, and it became clear it was a much bigger topic than we could handle in an hour or two. We also ticketed ‘management practices’, which will likely be broken into multiple extensions on cover crop, fertility, crop protection, manure, irrigation, residue management, etc. It may make sense to have an overarching extension that groups them together.
Extensions are generally the area that is ripest for collaboration, and we’re keen to get at least some initial alpha releases out. The recommended way to work on these is to start with one dataset that represents some additional data related to field boundaries and see how they do it. Ideally, at least a couple of datasets that represent the same attributes are found, and if they all do things similarly then it should be easy to determine what goes in the extension. If their approach differs, just pick the one that makes the most sense and is the most future-proof. The idea is to release ‘early and often’, and to get feedback through actually ‘doing’, not trying to gather all potential stakeholders in a room. Ideally, by a ‘1.0 release’ of any extension there are many different datasets in fiboa that use the extension, so we feel confident that it works well.
So I hope this post served as a solid introduction to the core fiboa specifications. We’ll aim to follow-up very soon with details on the data, tools and community that are just as important as the core spec. And after that we’ll post more of the ‘why’ behind the initiative. If you’re intrigued by what you’ve read then please consider joining us! We certainly can’t do it all alone, and this movement is only going to succeed beyond our dreams if we manage to attract far more people than the original group. We’ve got a lot of great momentum, and the amazing support of Taylor Geospatial Engine, but the goal is to use these next months to really bootstrap the community that will live beyond the initial Innovation Bridge Initiative.
To join the community check out our developer communication channels or just start digging into all the repos linked to from github.com/fiboa.
Exactly 7 years ago I had the chance to Interview Kurt Menke. You may know Kurt from his books, Discovering QGIS. You may know him from his consulting days over in New Mexico. You may know him from his interviews up here. At the end of the interview 7 years back Kurt said “For me […]
The post Where are they now? Kurt Menke appeared first on GeoHipster.
We are excited to announce fiboa (Field Boundaries for Agriculture), a new project we’re collaborating on with the Taylor Geospatial Engine (TGE) focused on improving interoperability of farm field boundary data and other associated agriculture data. We’re excited about the enormous potential of this project, and we’ve already started a community of people who share our excitement.
fiboa is the first concrete result from the TGE Field Boundary Initiative, which aims to enable practical applications of AI and computer vision to Earth observation imagery for a better understanding of global food security. The initiative has spurred collaboration between academia, industry, NGOs, and governmental organizations toward creating shared global field boundary datasets that can be used to create a more sustainable and equitable agriculture sector.
We worked with TGE to launch this effort in February at a workshop in St. Louis that brought together almost 20 different organizations including Microsoft, Google, Bayer, and the World Resources Institute. At the end of the two days, we shipped version 0.1 of the fiboa core specification (we have continued to work on it and it is now at version 0.2), which provides a common language for any dataset to describe field boundaries and a way to add extensions to add ancillary data and metadata about fields.
We recently wrote about how commonly used data schemas are essential to enable data interoperability and collaboration on complex global challenges. fiboa is our first effort to put that thinking into practice by creating a common schema for farm field boundaries.
Farm fields are a foundational unit of production for any agricultural supply chain. Efforts to improve agricultural practices such as the European Union’s deforestation regulation (EUDR) will only succeed if we have reliable information about where food comes from, and many of the world’s most common food products like wheat, maize, potatoes, and soybeans originate from farm fields. Despite this, there is not yet a commonly accepted data schema to describe farm fields.
By convening a community of practitioners who work with field data, we hope to solve this problem quickly and practically, creating a shared language that will foster and accelerate innovation among everyone working to understand agricultural supply chains. A common schema will enable seamless exchange of data among a variety of tools rather than requiring data engineers to create mappings from one dataset to another.
We’re doing this now because we are at a point when it is possible to create and distribute planetary-scale field boundary data much more quickly and cheaply than ever before. Many field boundaries around the world can be derived by applying machine learning (ML) techniques to freely available satellite imagery. TGE has funded research teams led by Dr. Hannah Kerner at Arizona State University and Dr. Nathan Jacobs at Washington University in St. Louis to accelerate research in this area and determine if ML-derived data can be commercially viable.
Simultaneously, advances in cloud-native vector data formats like GeoParquet also make it trivial to share the large volumes of field boundary data that can be produced from satellite imagery. We will be working with researchers who produce field boundary data to get it into the fiboa specification and then publish it in cloud-native formats on Source Cooperative.
Though we don’t have a formally published set of collaboration principles, this declaration from the Internet Engineering Task Force (IETF) is a good summary of our approach:
“We reject: kings, presidents, and voting. We believe in: rough consensus and running code.”
We place higher value on practical, working solutions and the broad agreement of participants than trying to vote for the perfect data schema. The principles that guide us are similar to the Core Principles we came up with for the SpatioTemporal Asset Catalog, summarized here:
It is in this spirit that fiboa will be built in public with many members of our community throughout the coming year and beyond. While fiboa is merely a metadata specification, we see it as the basis for a robust architecture of participation that will allow many people and institutions to contribute the specifications, tools, and data that we need to improve our understanding of the global agricultural sector. For a deeper dive on what we’re up to see fiboa: Core Specification & Extensions and fiboa: The Ecosystem.
If you’re interested in getting involved, please feel free to contribute via the fiboa GitHub, join our Slack, or email us at [email protected]. We also invite you to a live tutorial presented by Matthias Mohr on Thursday April 25 at 12 pm ET and on Monday, April 29 at 11 am EST. These sessions introduce you to the fiboa CLI and demonstrate how to create a fiboa extension, and the recordings are now available at github.com/fiboa/tutorials/.
Light Detention and Ranging (Lidar) is a detention system that uses lasers to examine the earth’s surface. It uses airborne tools to examine to collect spatial information. The technology uses laser lights sent from a transmitter, and the object in the scene reflects the light. The reflected is detected and analyzed, thus providing information required […]
The post Applications of Lidar in Agriculture first appeared on Grind GIS-GIS and Remote Sensing Blogs, Articles, Tutorials.Tell Us About Yourself:I was born and raised in Mexico City. I have a master’s in Spatial Planning [meaning absolute space], and I am a map enthusiast and a criminal analysis enthusiast, what better way to represent criminal behavior than using maps! I worked as a crime analyst for years in Mexico and I am […]
The post Maps and Mappers of the 2024 Calendar – Laura Angelica Bautista Montejano – April appeared first on GeoHipster.
Sessions Abstract:
There is an urgent need for research that promotes sustainability in an era of societal challenges ranging from climate change, population growth, aging and wellbeing to that of pandemics. These need to be directly fed into policy. We, as a Geosimulation community, have the skills and knowledge to use the latest theory, models and evidence to make a positive and disruptive impact. These include agent-based modeling, microsimulation and increasingly, machine learning methods. However, there are several key questions that we need to address which we seek to cover in this session. For example, What do we need to be able to contribute to policy in a more direct and timely manner? What new or existing research approaches are needed? How can we make sure they are robust enough to be used in decision making? How can geosimulation be used to link across citizens, policy and practice and respond to these societal challenges? What are the cross-scale local trade-offs that will have to be negotiated as we re-configure and transform our urban and rural environments? How can spatial data (and analysis) be used to support the co-production of truly sustainable solutions, achieve social buy-in and social acceptance? And thereby co-produce solutions with citizens and policy makers.
We’re going to talk about a DuckDB-Wasm web mapping experiment with Parquet. But first we need some context! Common Patterns Every application is different, and most architectures are unique in some way, but we sometimes see common patterns repeated. The diagram below shows a common pattern in web mapping. Figure 1: A common high-level architecture …
A DuckDB-Wasm Web Mapping Experiment with Parquet Read More »
The post A DuckDB-Wasm Web Mapping Experiment with Parquet appeared first on Sparkgeo.
In the past we have blogged about the challenges of agent-based modeling but one thing we have not written much about is the challenge of uncertainty especailly when it comes to model calibration. This uncertainty is a challenge when it when it comes to situations where various parameter sets fit observed data equally well. This is known as equifinality which is a principle or phenomenon in system theory that implies that different paths can lead to the same final state or outcome.
In a new paper with paper with Moongi Choi, Neng Wan, Simon Brewer, Thomas Cova and Alexander Hohl entitled "Addressing Equifinality in Agent-based Modeling: A Sequential Parameter Space Search Method Based on Sensitivity Analysis" we explore this issue. More specifically we introduce an Sequential Parameter Space Search (SPS) algorithm to confront the equifinality challenge in calibrating fine-scale agent-based simulations with coarse-scale observed geospatial data, ensuring accurate model selection using a pedestrian movement simulation as a test case.
If this sounds of interest and you want to find out more, below you can read the abstract to the paper, see the logic of our simulation and some of the results. At the bottom of the page, you can find a link to the paper along with its full reference. Furthermore, Moongi has made the data and codes for indoor pedestrian movement simulation and Sequential Parameter Space search algorithm openly available at https://zenodo.org/doi/10.5281/zenodo.10815211 and https://zenodo.org/doi/10.5281/zenodo.10815195.
Abstract
This study addresses the challenge of equifinality in agent-based modeling (ABM) by introducing a novel sequential calibration approach. Equifinality arises when multiple models equally fit observed data, risking the selection of an inaccurate model. In the context of ABM, such a situation might arise due to limitations in data, such as aggregating observations into coarse spatial units. It can lead to situations where successfully calibrated model parameters may still result in reliability issues due to uncertainties in accurately calibrating the inner mechanisms. To tackle this, we propose a method that sequentially calibrates model parameters using diverse outcomes from multiple datasets. The method aims to identify optimal parameter combinations while mitigating computational intensity. We validate our approach through indoor pedestrian movement simulation, utilizing three distinct outcomes: (1) the count of grid cells crossed by individuals, (2) the number of people in each grid cell over time (fine grid) and (3) the number of people in each grid cell over time (coarse grid). As a result, the optimal calibrated parameter combinations were selected based on high test accuracy to avoid overfitting. This method addresses equifinality while reducing computational intensity of parameter calibration for spatially explicit models, as well as ABM in general.
Keywords: Agent-based modeling equifinality calibration sequential calibration approach sensitivity analysis.
![]() |
Detail model structures and process of the simulation. |
![]() |
Pedestrian simulation ((a) Position by ID, Grouped proportion – (b) 0.1, (c) 0.5, (d) 0.9). |
![]() |
Multiple sub-observed data ((a) # grid cells passed by each individual, (b) # individuals in 1x1 grid, (c) # individuals in 2x2 grid cells). |
![]() |
Validation results with train and test dataset ((a) Round 1, (b) Round 2, (c) Round 3). |
Full Reference:
Choi, M., Crooks, A.T., Wan, N., Brewer, S., Cova, T.J. and Hohl, A. (2024), Addressing Equifinality in Agent-based Modeling: A Sequential Parameter Space Search Method Based on Sensitivity Analysis, International Journal of Geographical Information Science. https://doi.org/10.1080/13658816.2024.2331536. (pdf)
We created the Cloud-Native Geospatial Foundation because we’ve noticed rapid adoption of cloud-native geospatial formats, such as Cloud-Optimized GeoTIFF (COG), SpatioTemporal Asset Catalogs (STAC), Zarr, and GeoParquet. Both data providers and users enjoy time and cost savings when using cloud-native formats, and we believe there’s a need to help more people learn how to benefit from them.
Despite that, creating more cloud-native formats is a non-goal for us. There are plenty of people within our community working on cloud-native formats such as Cloud-Optimized Point Clouds (COPC), GeoZarr, and PMTiles. At this point, most use cases are covered by existing formats.1
In addition to the time and cost savings, one huge benefit of using cloud-native data formats is interoperability – the ability for different systems to share information easily. Common data formats are an essential part of interoperability, but we’re starting to explore a new dimension of data that is may be much more important to enabling interoperability: common data schemas and common identifiers.
This post is an effort to explain how common data schemas and identifiers can enable global cooperation and maximize the value of geospatial data.
Common data schemas refer to widely used ways to name and refer to the attributes and values within data products. Common identifiers are widely used ways to refer to unique entities in the world. To use a very simplistic example, a schema to describe a person could consist of first_name
, last_name
, passport_issuing_country
, and passport_number
. Because humans may share the same first and last names, we can’t use names as unique identifiers, but we can expect that countries will not issue the same passport number to multiple people. Therefore, a globally unique identifier for a person could be made up of a combination of the values of passport_issuing_country
and passport_number
.
While simple and imperfect in many ways, we have developed a several workable ways to describe and identify individual humans, which is foundational to things like travel, telecommunications, banking, and public safety. Let’s compare that to the present state of open geospatial data.
If you were to download parcel data from 3 different adjacent counties anywhere in the country today, the data you get would likely all have different names of attributes of those parcels. In Washington State, for example, King County shares data where parcel data is shared in a column named PIN
, Pierce county parcels are in a column named parcel_num
, and Snohomish county’s header is named PARCEL_ID
. Likewise, each county uses different attributes with different names to describe those parcels. Because of this, if you wanted to combine the data into a single dataset containing parcels from different counties, you’d need to understand the meaning of each of their attributes and figure out how to translate them all to use consistent names.
Since parcel data is quite valuable, there are a number of companies who make it their business to acquire parcel data from official sources and transform it into a common data schema. At some level this makes sense – governments produce parcel data for their own local needs and have few incentives to spend time agreeing on schemas with other governments, and the market has found a way to solve the inefficiencies that come from this lack of coordination. But on another level, we think this this is something we should try to fix.
As we’ve written before, merely opening data is not enough. The current state of merely making geospatial data available for download does not amount to “infrastructure” if it fails to enable interoperability. Government agencies (and the citizens they serve) benefit when it’s easy for them to share data with their neighbors and other stakeholders. If we can lower the cost of consolidating disparate datasets, it will be easier for us to cooperate on shared challenges.
Because publishing cloud-native data is as simple as uploading files to a commodity cloud object service2, it is now easier for data publishers to collaborate and experiment with new schemas. This ease of experimentation was a major contributor to the development and adoption of STAC. Contrast this with the state of numerous open data portals today that require rigid data models and are designed in ways that discourage experimentation at the schema level.
One way to solve this problem could be to create a platform that requires data providers to use the same data model rather than having them build their own. We have a great example of a collaborative effort to do just that: OpenStreetmap (OSM). Just like Wikipedia provides a consistent format and is loosely governed by a set of principles to create one huge encyclopedia for the world, OSM has created a shared space for anyone to contribute to one huge map of the world. Despite the immense value created by OpenStreetMap, we believe that a better model for geospatial data beyond mapping data is the open source ecosystem.
Open source software doesn’t rely on one big repository with one set of rules governing one community. It’s incredibly diverse, made up of many different repositories, created by people with diverse needs, with different values embedded into their approach. Some projects are large and benefit from many contributors. Others are relatively small tools maintained by just a few people. Some projects are esoteric and others are foundational to the entire Internet. A robust cloud-native spatial data infrastructure can share a similar structure, emerging from contributions made by many communities that are cross-dependent with one another.
The way to enable collaboration across such a diverse community is by providing foundational datasets, identifying a few core identifiers and geometries that everyone relies upon, and providing flexible schemas that enable different communities to meet their unique needs while speaking a common language. From our perspective, enabling diversity isn’t merely a nice thing to have, but it’s core to maximizing the value of geospatial data.
Once a common data schema is established, everyone publishing data in that format is contributing towards a collective understanding of our world. That sounds grand, but it’s not impossible. It’s similar to how languages develop. To do this, we propose starting by focusing fundamental attributes of our environment that are easy to understand across different contexts.
We also don’t envision a single data schema that everyone has to align to. Instead, there’s a way to start with a small, common core of information that gives data providers the flexibility to use the pieces that are relevant to them and easily add their own. This approach is based on our experience building the SpatioTemporal Asset Catalog core and extensions approach.
If successful, this will not only make things easier for users who want to combine a few sets of data, but will enable the creation of new types of software and data-driven applications. One of the pitfalls of traditional GIS tools is that many GIS users have been satisfied as long as their data looked ok on a map. We love maps, but a map is an interface created to deliver data to human eyes. If we want to maximize the value of geospatial data, it’s no longer enough to display it on maps – we need to optimize it for training AI models.
Common schemas and identifiers will make it much cheaper to write software that brings together diverse data, runs models, makes predictions, and lets people try out different scenarios. The approach we encourage enables data to remain fully abstracted, allowing software developers to point their code or models at it and have it “just work” without needing to be a data engineer.
A clear example of such innovative software is to create AI-based interfaces, and to realize the vision for ‘Queryable Earth’.
Right now, ChatGPT doesn’t know how many buildings there are in San Francisco, but this query is quite easy with a geographic information system, after downloading the building footprints dataset from data.sfgov.org.
Making geospatial data available in common schemas will make it easier for models like ChatGPT to answer questions about our environment, but it will also let us ask more challenging questions, like ‘How many buildings are there in San Francisco over 200 feet tall’? Or even ‘what percent of buildings in San Francisco are within 500 feet of a bus stop?’. These are questions that a common data schema makes possible and potentially even easy. The first question just needs a definition of ‘height’ in any building data set, and the second needs a definition of ‘bus stop’ datasets.
Thinking further, this approach would make it trivial to write a GPT that would work with any ‘building’ dataset, especially if the dataset used a data schema with well-defined fields.
The above is a GPT made with the San Francisco dataset to illustrate the point. Yes, the GPT provided an answer when asked about the number of buildings over 200 feet tall in San Francisco, but the answer is not actually right. The attribute it used for “feet” was actually a shortened name for P2010mass_ZmaxN88ft
which is defined in the dataset’s PDF documentation as ‘Input building mass (of 2010,) maximum Z vertex elevation, NAVD 1988 ft’. ChatGPT couldn’t find where to get the height of buildings, so it used another column that shows the mass of buildings, calculated by LiDAR. There are actually at least 11 potential height values in this dataset. Part of the challenge of defining common schemas will be to identify and prioritize sensible defaults, so we can create simpler tools that provide the results that most people expect, while allowing extensions for people who have more specific queries.
This general pattern should work for any common type of data and creates opportunities to improve the usability of data for climate use cases. We can imagine common schemas used to describe land parcels, pollution, trees, demographics, waterways, ports, etc.3
In particular, we are already focusing on ways to enable interoperability of agricultural data, finding common schemas for things like farm field boundaries, normalized difference vegetation index (NDVI), leaf area index, soil water content, yield predictions, crop type, and more.
Varda’s Global FieldID
This is where the combination of common schemas and identifiers becomes powerful. We have been collaborating with Varda, a group that has created a collaborative approach to creating Global FieldID, a service creates stable, globally unique identifiers for farm field boundaries. By merely providing a common way to refer to farm fields, Global FieldID creates more transparent agricultural supply chains and simplify regulation to encourage regenerative agriculture practices and prevent deforestation. Beyond those benefits, having a common way to refer to farm fields will dramatically lower the cost of collaboration on agricultural data.
Other great examples of pioneering work to create global identifiers include Open Supply Hub which creates unique IDs for manufacturing sites, and the Global Legal Entity Identifier Foundation which issues unique identifiers for legal entities all over the world.
A powerful benefit of common identifiers is how they allow different databases to refer to the same thing. A commonly used identifier can be used as a ‘join key’ that users can use to combine (or join) disparate datasets. This opens up a possibility to distribute information that’s spatial in nature, without having to always include geometry or location data.
For example, Planet provides data products that include Planetary Variables, like crop biomass and soil water content, that are updated daily. These data products are rasters, and many workflows involve downloading the full raster every day. But if we had a common schema and a common identifier for farm field boundaries, these variables could be easily summarized into a simple table that contains the variable value and the field ID. Rather than redundantly sharing or storing geometry data, users could just update soil water content as it relates to their field IDs each day.
Screenshot taken from Varda’s Global FieldID service showing field boundaries and their identifiers in southern Brazil.
Taking this further, these common identifiers provide a shared framework for everyone to create all kinds of new agricultural data products that are interoperable with other systems. Imagine a Kenyan entrepreneur who has access to local data sources and insights that allow them to develop accurate yield predictions for maize in their region – common field boundary data and identifiers would allow them to create a data product that is as simple a table with predicted yield per field specified by the field’s ID.
Similarly, someone focused on exposure data for disaster risk assessment in supply chains could data from OpenSupplyHub and create a table that adds information about building materials and occupants to OpenSupplyHub production facility IDs.
Full communities could form around adding attributes to globally defined datasets. The GeoAsset Project could be streamlined by people collaborating globally to add ownership information to unique building identifiers in Overture, unique farm fields in Global FieldID, production facility IDs in OpenSupplyHub, or any find of asset that can be assigned a unique identifier.4 This would allow someone to create a GPT that could answer questions like “Which suppliers in our supply chain are most exposed to flood risk?” and “Are there any development groups who we could work with to mitigate flood risk where we operate?”
We are going to be working out these ideas throughout the course of the year, diving deeper into use cases and showcasing examples of data products that put these ideas into practice on Source Cooperative.
We will specifically be working on issues related to air quality in collaboration with AWS and applying AI to agricultural data with the Taylor Geospatial Engine as part of their first Innovation Bridge program.
If you know of any good examples of common data schemas or groups working on creating common identifiers, we’d love to hear about them. Please write to us at [email protected].
If you enjoyed this, please consider watching Chris Holmes’s presentation at FOSS4G-NA 2023: Towards a Cloud Native Spatial Data Infrastructure. Or you can just read the speaker notes in his slide deck.
This is not to say we believe the existing formats will live forever – we welcome more innovation in cloud-native formats and will support members of our community as they explore new formats. ↩︎
And we’re working hard to make hosting data in the cloud as easy as possible through Source Cooperative. ↩︎
We want to reiterate that we know none of this will be simple (n.b. these two great papers: When is a forest a forest? and When is a forest not a forest?), but we believe it is possible. ↩︎
The Radiant Earth post Unicorns, Show Ponies, and Gazelles argues that we need to create new kinds of organizations that can create and manage global unique identifiers to make this vision a reality. ↩︎
On Monday, we had our regular GeoParquet community meeting, and everyone agreed it’s a pretty exciting time, but that we need to tell people more about what we’re up to. We’re feeling ‘feature complete’ for a version 1.1 release of the specification, and so wanted to give all implementors a heads up so they could try it out and give any last feedback. And to also just share what’s been cooking. We’ll have a full announcement release when the 1.1 release goes out, so consider this a bit of a preview.
The focus for version 1.1 has been on ‘spatial optimizations’. We actually decided to not include any spatial index or hints in GeoParquet 1.0. Which may seem surprising, but we really wanted to focus on interoperability – making sure that everyone writing geospatial data into Parquet would write it the same way. We knew that there was a whole lot that could be done to make it a better spatial format, but we wanted to give more time for experimentation with different approaches. In the early days of GeoParquet (version 0.2) there was a really great talk from Eugene Chopish that explored the various possibilities. If you want a really great deep dive do check out his talk on Youtube.
Two of the ideas he proposed saw significant community experimentation and have proven to be quite useful. Both were proposed as Pull Requests, and after extensive discussion and implementation they have both landed on ‘main’ and will soon form the basis of the 1.1 release.
The first introduces a bounding box column, and the second brings GeoArrow encoding as an option.
One key insight has been that Parquet is such a great format that we can use some properties inherent to it instead of just adding a full spatial index.
Parquet is a columnar format, meaning its organized by columns instead of rows. This makes it very fast to access only one or two fields, since the rest don’t need to be read. But it uses this construct called a ‘row group’ to provide fast access to a set of rows. But it’s more of a ‘chunk’ of rows, usually at least hundreds, though you can set the size of your row group fairly easily.
The cool thing is that there are lots of built in ‘summary stats’, so Parquet readers can easily jump to just the row groups they need.
What this meant was that you could actually spatially optimize a GeoParquet file without needing a defined spatial index construct. So with GeoParquet 1.0 you could have a file that loaded in QGIS like this. You’ll notice that the whole world just loads everywhere at once.
This is atypical for spatial data. We’re all used to files loading something more like this, where the chunks are loaded spatially:
The cool thing is that both of the files loading are GeoParquet – the files are just organized a bit differently. The different loading also means that clients can access the chunks more efficiently, as long as it’s set up right.
All you need to do is just order things spatially. In early experiments people would add a space filling curve like quadkey or S2 as an extra column. And then you could use that in your query. And it worked well! The Parquet file would use the stats for that column efficiently, and so you could just add the quadkey to your query and you’d have a spatial index.
We thought about standardizing on something like this, but it wasn’t clear which one to pick.
But a more sensible way emerged. Overture’s first release (which wasn’t even yet GeoParquet, just Parquet) had a BBOX column, using the ‘struct’ construct of Parquet to put all four values in a single column.
The nice thing about this is that the you can order / organize the data in any spatial way that you want, and Parquet readers can use the stats on the x and y values to efficiently get the right data.
Early experiments supported the notion that if you got the right bbox struct and can do ‘predicate pushdown’ – ie make use of the nested structures – then the performance is stellar – for example 1.5 seconds versus 4 minutes against large datasets. The big question for us was whether enough readers would support reading the ‘struct’.
The alternative was to just put the 4 bounding box values at the top level. Which felt not nearly as clean, and not really taking advantage of what Parquet offers.
Thankfully our big survey revealed that most every major implementation that we want to support GeoParquet could take advantage of the stats from the struct for more efficient indexing. And the discussion even nudged a few implementations to fully support the predicate pushdown.
The changes in the spec are actually quite minimal, just a couple new paragraphs explaining how to represent the bounding box as a struct. The spec does not say anything about how to spatially organize the data, but most tools that support writing GeoParquet data with the bounding box will automatically spatially optimize the data.
The feature is ‘opt-in’ – you can make compliant GeoParquet without adding the BBOX. But we suspect that most geospatial tools will start to understand it, and to also produce it by default. And since it’s based on native Parquet structures most non-spatial tools reading Parquet will also be able to take advantage of it as well.
The result is that any spatial filtering can be much faster. And if your data is large, and/or it’s accessed over the network, then the speed improvements can be dramatic. Jake Wasserman, who really drove this PR, shared some compelling results from making use of the new feature in Overture & DuckDB:
In short the time to query 891 rows from a 2.2 billion row dataset went from almost 2 hours to 34.75 seconds, a 191x improvement.
The other major improvement that just landed is GeoArrow support. GeoArrow is an incredible project, bringing geospatial support to Apache Arrow. The origins of GeoParquet actually rest in GeoArrow, which is a much more ambitious project, as it can lead to dramatic performance improvements in working with geospatial information. The recently announced lonboard project from Kyle Barron and Development Seed shows off some of what’s possible:
GeoArrow makes it possible to render millions of points directly in the browser, and also enables seamless passing of the data between different programming languages.
GeoParquet is generally used in conjunction with GeoArrow, as the format to store the data in, or to transfer it online, and having an option to encode GeoArrow directly means that it’s much faster to parse from GeoParquet. But using the encoding in GeoParquet also has a number of other advantages. The key is that GeoArrow introduces a columnar geometry format. Arrow and Parquet are both columnar formats, meaning that all the data is stored in columns, not rows. But the core of GeoParquet has been ‘Well Known Binary’, which is not a columnar format – it is just a ‘blob’, a binary that is opaque to any standard Parquet or Arrow tooling. The new Bounding Box column introduces a structure that isn’t opaque, the bounding points are columnar, and so standard tooling can make use of it. GeoArrow goes a step further and makes it so the geometry format itself is columnar.
The advantages of this are most apparent in the case of points. If you add a bounding box around a point then it ends up just replicating the point a couple more times. And if the point is well-known binary than you need that bounding box for efficient spatial queries. But if the point itself is columnar then the bounding box struct isn’t really needed, as all the tooling can use the Parquet stats directly, since it’s not an opaque blob. The advantages are most clear with points, where you don’t need to replicate the data just for the bounds. But lines and polygons can also be represented in a columnar way, and the bounds for those can easily be calculated, just like the stats are constructed for other structs – reporting the min and max values.
There are a number of potential advantages to having a columnar geometry, and generally it should work better with tools that understand Parquet but don’t know anything about ‘geospatial’. It’s always been a goal to include an encoding option for a columnar format, so it’s great to see it land. Adoption of this will likely be slower than the bounding box column, as many tools will need to write a completely new parser (most libraries already have a WKB parser). But it will ultimately make things work faster, and open up more possibilities for tooling to stream directly from it. Eventually using the GeoArrow encoding may make it so you don’t even need to use a bounding box column for spatial queries, but we anticipate it’ll take awhile for all the tooling to get there. The two features were designed to be completely complementary, so there is no problem to use both.
A huge thanks to Joris Van den Bossche and Dewey Dunnington for all their work on GeoArrow, and for driving this PR to completion.
If you are an implementor of a GeoParquet tool we highly recommend trying out the new features and giving us feedback, as it’ll be harder to change after the 1.1 release. We aren’t cutting a ‘beta’ release, as we don’t want there to be some small number of files that everyone needs to understand. So if you’re implementing this just use 1.1 for version number, as we don’t anticipate any changes. And if there are changes then all the tools will just update so they use it right.
If you’re a data user you can try out the bounding box column today – all the latest Overture releases support it. And you can use tools like DuckDB to make use of the column if you just form your queries right. GDAL/OGR is also implementing it, to automatically take advantage of the column if it’s there, and to be able to write out new data to it. GDAL/OGR has also been supporting GeoArrow, but there are a few small tweaks needed to get it to 1.1 GeoParquet compliance.
Thanks to everyone who contributed to these two pull requests and the tooling around them – it’s been a really great community effort.
Disasters have been a long-standing concern to societies at large. With growing attention being paid to resilient communities, such concern has been brought to the forefront of resilience studies. However, there is a wide variety of definitions with respect to resilience, and a precise definition has yet to emerge. Moreover, much work to date has often focused only on the immediate response to an event, thus investigating the resilience of an area over a prolonged period of time has remained largely unexplored. To overcome these issues, we propose a novel framework utilizing network analysis and concepts from disaster science (e.g., the resilience triangle) to quantify the long-term impacts of wildfires. Taking the Mendocino Complex and Camp wildfires - the largest and most deadly wildfires in California to date, respectively - as case studies, we capture the robustness and vulnerability of communities based on human mobility data from 2018 to 2019. The results show that demographic and socioeconomic characteristics alone only partially capture community resilience, however, by leveraging human mobility data and network analysis techniques, we can enhance our understanding of resilience over space and time, providing a new lens to study disasters and their long-term impacts on society.Keywords: Wildfire, Community resilience, Network analysis, Resilience triangle, Human mobility data.
![]() |
Resilience triangle. (a) The original resilience triangle (adapted from Bruneau et al., 2003); (b) The modified resilience triangle used in this study. |
![]() |
An overview of the research outline. |
![]() |
The distribution of degree centrality for each census block group colored by different clusters. (a) The Camp wildfire; (b) The Mendocino Complex wildfire. |
Chen, Q., Wang, B. and Crooks, A.T. (2024), Community Resilience to Wildfires: A Network Analysis Approach by Utilizing Human Mobility Data, Computers, Environment and Urban Systems, 110: 102110. (pdf)
Effective land use and facilities design require in-depth knowledge of the site’s physiography, hydrology, climate, human geography, and infrastructure. Before breaking ground, evaluate your long-term facility’s location, environmental conditions, and surrounding area. Engineers and project planners conduct site surveys for this information. With modern GIS imagery and tools, we can now visualize scenery worldwide. GIS […]
The post Applications Of GIS Land Use Mapping first appeared on Grind GIS-GIS and Remote Sensing Blogs, Articles, Tutorials.On February 7th and 8th, in collaboration with Earthmover, we held a Zarr sprint at the LEAP NSF Science and Technology Center at Columbia University in New York City. A wide array of contributors from government, academia, and industry came to the sprint, including people from NASA, CarbonPlan, Development Seed, Earthmover, Upstream Tech, Columbia University, Hydronos Labs, and Fused.
In this post, I give a very brief overview of each of the topic areas we discussed. More importantly, I link out to the open issues, pull requests, discussions, and meeting opportunities identified at the sprint for continued development.
The purpose of the sprint was to continue development of the Zarr specification. Zarr is a chunked, compressed, N-dimensional array format primarily designed for storing large numerical arrays efficiently. It is commonly used in scientific computing, geospatial, bioimaging, and data analysis contexts.
Enhancements to the Zarr specification that we discussed at the spring are described below.
In this breakout session, the group engaged in a long technical discussion about a way to define arrays in a Zarr store as concatenations of other arrays in the store. You can read a draft Zarr Enhancement Proposal (ZEP) of the discussion here. Shoutout to Tom Nicholas for documenting this so well!
Joe Hamman led a group focusing on enabling support for V3 in Zarr-Python. This was part of an ongoing effort working toward Zarr-Python version 3.0 (roadmap).
The focus of this group was on closing outstanding issues on the roadmap and testing the development branch in common geospatial applications. Zarr-Python has traditionally been the canonical implementation of Zarr, and it is therefore a current priority since this effort delivers immediate impact to the largest swath of users, including those that use Zarr through downstream libraries (e.g. Xarray, Dask, Anndata, etc.).
In the Zarr pyramids breakout group, Thomas Maschler and Max Jones discussed the motivations for following the OGC TileMatrixSet 2.0 specification within the GeoZarr specification, which will be shared as a new issue to supersede GeoZarr Issue #30. They also discussed reading those TMS into rio-tiler using Xarray and started a refactor of ndpyramid to support the TMS specification.
Kyle Barron worked on a prototype for an alternate store for Zarr Python using new async Python bindings to Rust’s object-store project. You can see a prototype of object-store-based store implementation at zarr-python#1661.
Throughout the sprint, the GeoZarr focus group, led by Brianna Pagán, worked on examining the interoperability of GeoZarr and different existing tooling and store support. You can see the table here.
One of the biggest realizations was that ArcGIS has a lot of existing support for Zarr, which is really exciting news! For other tools, there is still work to be done, especially for GeoTIFF-like data being stored in Zarr, which translates to updates needed within the GeoZarr specification. For example, there are functionality issues tied to support or lack thereof for specific compression algorithms. The GeoZarr Steering Working Group is working on providing a list of supported compressions for commonly used tools. There is also work to be done on specifying the organizational structure of GeoZarr and understanding where requirements from CF diverge from the Zarr data model. For this, we are focusing efforts on involving folks with CF expertise to guide these conversations.
If you are interested in helping out, please join the next bi-weekly GeoZarr meeting every other Wednesday at 11 EST. The next will be March 20th and you can find the invite on the Zarr calendar or join directly from this link. Check out the notes from past meetings at the hackmd.
A final priority of the Zarr Sprint was to get efforts rolling on how to better visualize Zarr on the web.
Kevin Booth is the lead on this effort. Currently, he has added some sidecar files with links to reference parent, child, and root relationships within a Zarr store that would allow a client to be able traverse a Zarr store without needing an object storage interface with list capabilities. To demonstrate how this could work, Xavier Nogueira created traverzarr which allows to navigate a Zarr store as if it were in a file system. A more detailed blog post with updates on this work to come in the next week.
This work continues to be worked on after the sprint. In collaboration with the Zarr community, the Cloud-Native Geospatial Foundation has started holding bi-weekly meetings to hack on this work. The next will be held at 12 EST on March 14th. If you would like to be involved in this, email [email protected] to be added to the meeting invite, or find the meeting link at the Zarr calendar here.
It was great to get a group of people together to spend some dedicated time on Zarr, and plenty of work remains. Please help keep the momentum of these efforts going by responding to any GitHub Pull Requests, Issues, or Discussions that you have opinions on and joining any of the established Zarr meetings that are of interest to you.
The GRASS GIS 8.3.2 maintenance release contains more than 30 changes compared to 8.3.1. This new patch release includes important fixes and improvements to the GRASS GIS modules and the graphical user interface (GUI), making it even more stable for daily work.
The post GRASS GIS 8.3.2 released appeared first on Markus Neteler Consulting.
My thanks to Mark Brooks for this advice and tips.
Dr. Qiusheng Wu, Associate Professor at the University of Tennessee.
My primary motivation stems from a passion for harnessing the power of geospatial technologies to address environmental challenges. In the Spring of 2020, while teaching Earth Engine at the University of Tennessee, I encountered a significant obstacle. The Earth Engine Python API documentation was very limited, making it difficult for my students to effectively visualize and explore Earth Engine data interactively. This challenge inspired me to create geemap to bridge this gap. Since its initial release on GitHub in March 2020, I have dedicated considerable time and effort to fixing bugs and adding new features to the package. I was thrilled to see that geemap was adopted by Google and included in the Earth Engine documentation in October 2023.
The creation of leafmap was a response to the need for interactive visualization of geospatial data beyond Earth Engine. Leafmap allows users to visualize geospatial data across multiple cloud providers, such as Microsoft Planetary Computer and AWS, with minimal Python coding. My goal is to lower the barriers to entry, empowering students, researchers, and developers worldwide to leverage the immense potential of these powerful geospatial technologies and cloud computing. The invaluable feedback and feature requests from the community play a crucial role in shaping these tools’ ongoing development and evolution.
One of the most rewarding things is connecting with people who benefit from my videos all over the world. For instance, I received a heartwarming message from a learner who shared how my work has transformed their research and teaching, ultimately helping them secure a tenure-track position at a university. Knowing that my resources played a significant role in their success is incredibly fulfilling. Hearing about students applying the techniques in their thesis work, researchers using them to make new discoveries, or developers using them to build geospatial solutions for customers fuels my desire to create even better, more accessible tools and tutorials. I also welcome bug reports or feature requests on GitHub. Their feedback and requests guide the content I develop and ensure it’s truly meeting the needs of the community.
One of the standout features is the ability to publish and share geospatial data with the community using the AWS CLI. This eliminates the need to set up and manage an AWS S3 bucket and access permissions on my own, saving me valuable time and effort. In addition, its intuitive file browser-like interface makes finding the right data within the repository a breeze.
Based on my experience, I highly recommend Source Cooperative to anyone interested in making geospatial data more accessible. The platform’s features and capabilities have certainly influenced my approach to geospatial data analysis, enabling me to work more effectively and efficiently. Additionally, Source Cooperative has contributed to my research by providing a reliable and convenient platform for data storage and sharing. I highly encourage anyone in the geospatial community to give Source Cooperative a try. It’s a valuable tool that has the potential to greatly enhance your geospatial endeavors.
Source Cooperative supports cloud-native geospatial data formats like Cloud Optimized GeoTIFF and GeoParquet. This means that the data stored on Source Cooperative can be seamlessly consumed by popular open-source libraries such as leafmap and DuckDB. In the past, running even simple summary statistics on large vector datasets would take me hours. However, with DuckDB and the GeoParquet files on Source Cooperative, I can now perform analysis that used to take hours in a matter of seconds. It’s a significant improvement in efficiency and productivity.
I was thrilled to integrate these advancements into my Spatial Data Management course at the University of Tennessee, where students experienced these benefits firsthand. Open-access course materials are available here.
Every journey has its challenges, and my own path in the open-source geospatial field has been no exception. Along the way, I’ve encountered many obstacles that have taught me valuable lessons and profoundly influenced my work and the impact I strive to make.
One of the hard lessons I’ve learned is the importance of embracing failure and setbacks as opportunities for growth. I’ve experienced moments where I dedicated hours or even days to implementing features in my open-source packages, only to face setbacks. These experiences have taught me that failure is not a reflection of my abilities, but rather a chance to learn, adapt, and improve.
Collaboration and community have also played a pivotal role in shaping my approach to work. I’ve been fortunate to connect with talented open-source package developers from around the world, exchanging insights, collaborating on projects, and collectively pushing the boundaries of what we can achieve.
Lastly, I have learned the importance of adaptability and staying current with technological advancements. The geospatial field is constantly evolving, with new tools, techniques, and data sources emerging regularly. To stay relevant and make a meaningful impact, I have had to embrace a mindset of continuous learning and adaptability. This involves staying updated with the latest trends and being open to exploring new technologies and tools that can potentially be integrated into my open-source packages.
As a community, there are several actions we can take to accelerate the adoption of cloud-native data:
By focusing on education, documentation, and collaboration, we can collectively drive the adoption of cloud-native data. Together, we can harness the power of cloud technologies and unlock new possibilities in the geospatial domain.
As an open-source developer and educator, I’ve found various media forms to be influential in shaping my perspective on geospatial science and its role in environmental understanding.
Open-source package documentation: The documentation of open-source packages like ipyleaflet, localtileserver, geopandas, rasterio, and xarray, has been invaluable in learning about geospatial tools and libraries. These resources offer practical guidance and examples for incorporating geospatial analysis into my work.
Podcasts: MapScaping and MindsBehindMaps have been instrumental in broadening my perspective. These podcasts feature interviews with experts in the field, discussing topics ranging from remote sensing to artificial intelligence in geospatial science.
Online courses: Online courses are an excellent resource for learning geospatial concepts and technical skills, and several platforms offer valuable options. One notable platform is SpatialThoughts, which provides a range of geospatial courses that are highly beneficial for learners. I have personally found these courses to be incredibly valuable in improving my technical skills. I highly recommend exploring the courses offered by SpatialThoughts and taking advantage of the opportunity to learn from their comprehensive and well-designed curriculum. Whether you are a beginner or an experienced professional, these courses can help deepen your understanding of geospatial concepts and advance your skills in the field.
At Sparkgeo we don’t just make maps. We address a range of geospatial challenges and pursue the most appropriate solutions with an open and independent mindset. Those solutions sometimes produce pixels. Those pixels are often served over a network. We’re always striving to improve how we develop, test, and support our solutions. As part of …
Map Tile Identification: A New Addition to your Toolbox Read More »
The post Map Tile Identification: A New Addition to your Toolbox appeared first on Sparkgeo.
Over the last year or so there has been a lot of hype about artificial intelligence (AI) and Large Language Models (LLMs) in particular, such as Generative Pre-trained Transformers (GPT) like ChatGPT. In a recent editorial in Environment and Planning B written by Qingqing Chen and myself we discussed how LLMs could be used for lower the barrier for researchers wishing to study urban problems through the lens of urban analytics. For example, analyzing street view images in the past required training and segmentation of such data which a time consuming and a rather technical task. But what can be done using ChatGPT? To test this we provided ChatGPT some images from Flickr and Mapillary:
![]() |
Examples of using ChatGPT for extracting information from imagery. |
![]() |
Examples questions and responses when using ChatGPT for extracting information from imagery. |
If this sounds of interest I encourage you to read the editorial and think how you could leverage LLMs for your own research.
Full Reference:
Crooks A.T. and Chen, Q (2024), Exploring the New Frontier of Information Extraction through Large Language Models in Urban Analytics, Environment and Planning B. Available at https://doi.org/10.1177/23998083241235495. (pdf)
Do you know what your users’ needs or challenges are? Effective user experience (UX) design is a core element that can make or break a product’s success. At Sparkgeo the design team understands the significance of UX in creating solutions that not only meet but exceed user expectations. This post is the first in a …
Elevating Experiences: The Crucial Role of User Experience Design Read More »
The post Elevating Experiences: The Crucial Role of User Experience Design appeared first on Sparkgeo.
In a world where geospatial technology continues to evolve, we try to place ourselves at the intersection of innovation and inclusivity. We often imagine a map as a visual guide, but there are opportunities to make this information more accessible to everyone, including those with visual impairments. Today, we explore the concept of map-to-speech technology, …
Map-to-Speech: A Method for Making Web Maps More Accessible Read More »
The post Map-to-Speech: A Method for Making Web Maps More Accessible appeared first on Sparkgeo.
Billari, F.C. and Prskawetz, A. (eds.) 2003. Agent-based computational demography: Using simulation to improve our understanding of demographic behaviour. Springer.
Grow, A., & Van Bavel, J. (eds.). (2017). Agent-based modelling in population studies. Springer.
Other interesting papers relating to ABCD include a JASSS article entitled "When Demography Met Social Simulation: A Tale of Two Modelling Approaches" which showcases how demography and agent-based modeling can be linked using the UK as an example. For interested readers, our own work on synthetic populations can be found here.
When radar is equipped with an active sensor to illuminate objects, it is defined as active, and with a passive sensor relying on external sources is defined as passive. Remote sensing is the process by which the physical characteristics of an area are detected and monitored by measuring its reflected and emitted radiation at a […]
The post Application of Radar in Remote Sensing first appeared on Grind GIS-GIS and Remote Sensing Blogs, Articles, Tutorials.A multilinear map is a map with several variables. Each variable is linearly separated in multilinear maps, and these variables are vector spaces. Thus, a multilinear map takes some vector space variables as input and outputs vector space variables. The idea of multilinear mapping is critical in several fields and in performing encoding tasks. This […]
The post Applications of multilinear maps first appeared on Grind GIS-GIS and Remote Sensing Blogs, Articles, Tutorials.Radar is a system that works on high-frequency signal out transmission toward a target. After the signal bounces back from the target and returns, the radar uses the received information to identify the relative location and speed of the target. Military uses this technology as a booth, defensive, and offensive tool. RADAR means Radio Detection […]
The post Uses of Radar in the military first appeared on Grind GIS-GIS and Remote Sensing Blogs, Articles, Tutorials.Abstract
Dust storms are natural phenomena characterized by strong winds carrying large amounts of fine particles which have significant environmental and human impacts. Previous studies have limitations due to available data, especially regarding short-lived, intense dust storms that are not captured by observing stations and satellite instruments. In recent years, the advent of social media platforms has provided a unique opportunity to access a vast amount of user-generated data. This research explores the utilization of Flickr data to study dust storm occurrences within the United States and their correlation with National Weather Service (NWS) advisories. The work ascertains the reliability of using crowdsourced data as a supplementary tool for dust storm monitoring. Our analysis of Flickr metadata indicates that the Southwest is most susceptible to dust storm events, with Arizona leading in the highest number of occurrences. On the other hand, the Great Plains show a scarcity of Flickr data related to dust storms, which can be attributed to the sparsely populated nature of the region. Furthermore, seasonal analysis reveals that dust storm events are prevalent during the Summer months, specifically from June to August, followed by Spring. These results are consistent with previous studies of dust occurrence in the US, and Flickr-identified images of dust storms show substantial co-occurrence with regions of NWS blowing dust advisories. This research highlights the potential of unconventional user-generated data sources to crowdsource environmental monitoring and research.
![]() |
Data collection and workflow. |
![]() |
Distribution of Flickr identified dust storm occurrences and NWS dust storm advisories. |
Full Reference:
Join us on January 30, 2024, in San Francisco for our inaugural GeoParquet Community Day to highlight the usage of spatial data in Parquet, open table formats, and cloud-native spatial analytics. We are proud to have Wherobots as the convening sponsor.
Why GeoParquet
GeoParquet is an open-source project that recently released version 1.0.0, marking the culmination of extensive development, testing, and real-world usage. This stable foundation ensures reliability and consistency for the geospatial data community.
GeoParquet integrates spatial data into the Apache Parquet format, unlocking opportunities for data analysts, scientists, and engineers. It simplifies working with spatial data in the cloud and enables sophisticated spatial analytics.
The GeoParquet Community Day
The goal of this event is to introduce GeoParquet and the benefits of its practical applications to more people. Participants will learn from experts about how GeoParquet integrates into the spatial data ecosystem and engage in hands-on experience building an application.
What to expect
How to register
If you are interested in participating, we invite you to register here.
Call for speakers
Have you used GeoParquet in a project? Here is an opportunity to present a 5-minute lighting talk! Apply now to become a speaker.
Keep an eye out for potential travel grants
We may be able to support travel for attendees, contingent on funding. Stay updated with our latest announcements by following us on LinkedIn and Twitter.
Interested in sponsoring this sprint?
We are seeking additional sponsors to make this event as inclusive and impactful. Sponsors will play a crucial role in two key areas: funding travel scholarships for those facing financial barriers, and growing the GeoParquet community.
To learn more about sponsorship opportunities, please see our sponsorship prospectus.
In Part 1 of this series, we used satellite imagery to understand the impact of recent droughts on crop growth in the Canadian province of Alberta. In Part 2, we continue and extend our previous analysis, taking a more detailed view of specific areas in the province. But first it is worth mentioning why I’m …
Mapping Drought from Space – Part 2 Read More »
The post Mapping Drought from Space – Part 2 appeared first on Sparkgeo.
At the Cloud Native Geospatial Foundation, we encourage the adoption of cloud-native data architectures that make geospatial data more accessible, faster, and easier to work with. We believe that cloud-native geospatial data is a powerful tool to help us achieve our mission of making more data more available to more people. We seek to create a larger and more diverse community of people who can use geospatial data to solve problems all over the world.
To highlight how different communities are using cloud-native geospatial data, we invite you to dive into real-world use cases from various regions, with the first series of regionally-focused webinars taking place next month:
Webinar 1: Cloud-Native Geospatial in Brazil (conducted in Portuguese)
In our first webinar, we shine a spotlight on cloud-native geospatial use cases from Brazil. Moderated by Daniel Wiesmann, product manager at Development Seed, we welcome Frederico Liporace, the chief technology officer at AMS Kepler, to talk about how China-Brazil Earth Resources Satellite (CBERS)/Amazonia satellite data on Amazon Web Services. Roberto Santos, the senior software engineer at Microsoft will discuss the application of STAC and GeoTIFF COG in FarmVibes.AI.
This webinar will be in Portuguese and scheduled for Dec 5, 2023, at 12 PM BRT | 3 PM WET | 10 AM EST. Register to attend this webinar here: https://lu.ma/CNG-Brazil.
Webinar 2: Cloud-Native Geospatial in the Pacific
In the second webinar, moderated by Geospatial Data Scientist, Wei Ji Leong from Development Seed, we explore applications of cloud-native geospatial in the Pacific region. You will hear from Alex Leith, the technical director at Auspatious who will briefly describe Digital Earth platforms and their journey to cloud-native geospatial. Data Engineer Leo Ghignone from AODN will introduce a cloud-native data platform for the Great Barrier Reef and discuss how they apply open source tools like STAC for their metadata catalog and pygeoapi for their data API. Then we have FrontierSI’s Earth Observation Technical Lead, Fang Yuan who will introduce an ArcGIS toolkit that FrontierSI has developed in collaboration with their partners. This toolkit uses Digital Earth Australia’s analysis-ready, cloud-native satellite data to map groundwater-dependent vegetation and track changes in vegetation over time.
This webinar is scheduled for Dec 14, 2023, at 6 PM EST / Dec 15, 2023, at 10 AM AET | 12 PM NZDT. To attend this webinar, register here: https://lu.ma/CNG-Pacific.
Who should attend the webinar series?
The webinar series is open to anyone interested in learning more about cloud-native geospatial, including:
If you have any questions, please contact [email protected].
Brandon Liu – a technologist working at the vanguard of mapping, global network infrastructure, and vector graphics – has made an outsized impact on the open source geospatial community. The founder of Protomaps, an interactive mapping company established in 2019, Brandon has created a map project that has reshaped how we interact with geospatial data on the web. His creations include PMTiles, a serverless solution that optimizes the storage and retrieval of millions of map tiles in the cloud. In his current role as a Technology Fellow at Radiant Earth, Brandon continues his focus on advancing the geospatial domain through innovation, integration, and knowledge sharing. In this Q&A, we explore Brandon’s journey, the innovative work of Protomaps, his perspectives on the geospatial technology landscape, and his vision for the future of the field.
My journey in geospatial started when I discovered OpenStreetMap in 2012. I used OSM data to power a few of my own civic tech projects, like a 1980s street view app, and began building web mapping applications freelance for companies and humanitarian organizations.
The “full-stack” nature of mapping has always appealed to me. Geo is powered by domain-specific data structures and algorithms. Its end goals are not defined by hard technical requirements, but by human factors like: Do users recognize key city names and landmarks on the map? Is a map visualization a truthful representation of the data?
Over 10 years of making web maps, I repeatedly ran into the same pain points for visualizing vector data using open source tools. No client project I worked on could invest years in foundational infrastructure. Protomaps is the answer to those pain points - the distillation of a decade of open source mapping experience that I hope advances the state of the art.
Protomaps is a bootstrapped business - my mission is to demonstrate that it is possible for open source developers to make a living from their work. Being an independent business influences the design of formats like PMTiles, which is an alternative to Software-as-a-Service delivery for maps. Proprietary SaaS products, with some open source frontend sprinkled in, remains the most investor-friendly business model.
Being free from the constraints of a typical tech company means Protomaps is a better fit for applications underserved by the traditional software industry. Journalism and the public sector are two areas that have adopted PMTiles. Other use cases I’ve learned about in the past month include wildfire mapping in British Columbia and a storytelling app for indigenous communities.
Data analysts and stakeholders always want to see “all the data, on a map.” Making this possible in the web browser - the most ubiquitous and open computing platform - is a key benefit of cloud-native: it’s not a walled garden like Desktop GIS or smartphone App Stores.
A key challenge for cloud-native solutions is ensuring the move to the cloud does not create vendor lock-in. For the PMTiles ecosystem, it’s important to have first-class support for both AWS Lambda and Cloudflare Workers. Though both serverless platforms are proprietary, users of PMTiles should be able to migrate freely between them.
Another challenge for “cloud-native” is to work without the cloud at all. Many geospatial applications need to be air-gapped or work in developing countries with limited internet connectivity. A big hard drive there is more useful than S3! The PMTiles ecosystem works just as well off the cloud, though, and for that reason, it’s been adopted in field mapping and aviation.
Object storage is a de facto standard that creates near-perfect competition between cloud platforms. Users can adopt with confidence that their infrastructure won’t be deprecated on a whim by companies, and are free to migrate to other providers. This isn’t possible with closed platforms like Google Earth Engine.
A similar development happened for videos on the web. You used to need Flash or RealPlayer to view movie clips, but Range Requests made streaming video possible with plain HTTP. That advancement also led to the standardization of <video>
elements in browsers. In the same way, the Protomaps ecosystem brings interactive geospatial visualization to the browser, agnostic to what object storage it lives on, all using web standards.
My goal as a Radiant Earth Fellow is to support the community of Source Cooperative users through tooling. I’ve described vector data as the “missing half” of cloud-native geospatial. The PMTiles format already appears in several Source repositories. In the course of my fellowship, I’ve focused on improving and documenting the pmtiles
and tippecanoe
command line tools for visualizing vector data.
Beyond datasets on Source, the Protomaps ecosystem meets developers where they are by integrating with popular visualization libraries. I’ve enhanced the MapLibre GL mapping library, shipped v1.0 of OpenLayers integration, and will be further improving the integration with Leaflet. The OpenLayers plugin is already being used in the automatic preview on Source Cooperative.
WebAssembly is one technology that has been emerging for half a decade now, and its impact has been limited relative to its hype. I think that hype is real for Geospatial WebAssembly: serious geo applications need access to heavyweight libraries like DuckDB, GEOS, and GDAL, and the bundled size of WASM apps is an acceptable tradeoff. Whether the tools use WASM or not, I see a greater focus on data visualization and full analysis directly on the web. Anyone interested should check out Kyle Barron’s work on GeoParquet in the browser.
A major trend in the technology industry and fundraising environment, beyond geospatial, is the adoption of licenses like the Business Source License instead of permissive ones. Companies that spend years developing software rightly refrain from giving it away for free to larger competitors. A consequence is that smaller or more civic-minded applications also lose access to that software. In this new licensing atmosphere, I predict a greater role for foundations like Radiant Earth and the Cloud-Native Geospatial Foundation to take the lead on open source.
The development I’m most excited about for 2024 is taking the Protomaps project global. Protomaps’ core data product is an openly licensed cartographic tileset, and its focus up to now has been North America and Western Europe, meaning most users view the map in languages like English or French. Features I’m working on now will enable deploying the map in more than 25 languages; together with global datasets on Source Cooperative, anyone can build an affordable mapping application for an audience in South America, Africa, Asia, or anywhere in the world!
One channel I’ve adopted in 2023 is GitHub Sponsors - I’ve found it a great way to connect with an audience of developers working with the technology every day. I also post updates on Mastodon and X, and you can find me at open source geo conferences on multiple continents!
So the last few months I embarked on a seemingly simple project – translate more data into GeoParquet to help its adoption. My modest goal to start was to try to be sure there’s ten or twenty interesting datasets in GeoParquet, so it’s easy to try it out with some practical projects. But I’ve gone down a deep rabbit hole of exploration, which is by no means finished and indeed feels barely started, but I wanted to share what I’m seeing and excited about.
The simple core of it is an approach for distributing large amounts of geospatial vector data, but I think there are potentially profound implications for things like ‘Spatial Data Infrastructure’ and indeed for our industry’s potential impact on the world. For this post I’m going to keep it focused on the core tech stuff.
I’ve explained a few aspects of what I’ve learned about GeoParquet and DuckDB in my quest to make the Google Buildings dataset more accessible. Most of that what I learned was actually fairly incidental to the main thing I was trying to figure out: whether there’s a way to distribute a large scale dataset in a way that’s accessible to traditional desktop GIS workflows, future-looking cloud-native queries, geospatial servers, and mainstream (non-geo) data science / data engineering tools. To be explicit on each:
One of the beauties of the Cloud-Optimized GeoTIFF is that it is incredible for cloud-native queries – efficiently returning the location and just the requested bands of the raster image. But it’s also completely backwards compatible, as any COG could be loaded up in any traditional GIS tool. I don’t believe we’ll get anything so incredibly backwards compatible in the vector data world (I did pursue Cloud-Optimized Shapefile for a bit, but the only tool to actually order it right is buried in MapServer, and it’s got some serious problems as a format). But if we can teach QGIS (already done) and Esri (hopefully will happen before too long) to read GeoParquet then you can support that traditional workflow of downloading data and dropping it into the desktop. And to be clear, I don’t use ‘traditional’ to be dismissive or negative at all. It’s actually a really great workflow, and I use it all the time. I like streaming stuff, but I also like just having it on my computer to use whenever I want. I’m writing this on a plane right now, and it’s nice I can still explore locally.
So this is going to be a long-ass blog post going into all that I’ve explored, along with most of the ideas that I want to explore next. But I’ll start with the end for those who just want the tl;dr. I think there’s some great potential for GeoParquet partitioned by administrative boundaries to be an ideal vector data distribution format, especially when paired with STAC and PMTiles. In the course of working with Google Open Buildings data the first version of Overture data dropped, with Parquet as the core format. Awesome! Except it’s not GeoParquet, and it’s actually pretty hard to use in most of the workflows above. The team put together great docs for lots of options, but it wasn’t easy or fast to subset to an area you care about.
I wanted to try to help make that easy, so I had some good fun adapting what I’d learned from Google Buildings, and I’ve put up both the building and places datasets on Source Cooperative for each country as GeoParquet. If the country is big (over ~2 gigs) then I break it up by quadkey (more on that below). I’ve focused on GeoParquet, instead of putting up versions in GeoPackage, Shapefile, GeoJSON & FlatGeobuf. I think it’s hugely important to support those formats, but my idea has been to try out cloud-native queries that transform the data on the fly, instead of maintaining redundant copies in every format. And it works!
I’ve built an open source command-line tool, that takes a GeoJSON file as input and uses DuckDB to query the entire set GeoParquet dataset on source.coop, downloads just the buildings that match the spatial query and outputs them into the format the user chooses. You can check out a little demo of this:
This response time is ~6 seconds, but requires a user to supply the ISO code of the country their data is in (I have some ideas of how to potentially remove the need for the user to enter it). Without entering the ISO code the request scans the entire 127 gigabytes of partitioned GeoParquet data, and takes more like 30 seconds.
In many ways 30 seconds is a long time. But remember there is absolutely no server involved at all, and you are able to get the exact geospatial area that you care about. And in traditional GIS workflows the user would have to wait to download a 1.5 gigabyte file, which would take several minutes on even the fastest of connections.
The query just uses stats from the partitioning of Parquet to home in on the data needed, returns it to a little local, temporary DuckDB database, and uses DuckDB’s embedded GDAL to translate out into any format. If you request huge amounts of data through this then the speed of getting the response will depend on your internet connection.
Right now this is a command-line tool, but I’m pretty sure this could easily power a QGIS (or ArcGIS Pro) plugin that would enable people to download the data for whatever area they specify on their desktop GIS. If anyone is interested in helping on that I’d love to collaborate.
This tool should work against any partitioned GeoParquet dataset that follows a few conventions. There’s a few out there now:
Most of these are over 80 gigabytes of GeoParquet data, and they’d be even larger if they were more traditional formats. Right now the CLI only works with the first two, using a ‘source’ flag, but I’m hoping to quickly expand it to cover the others, as only minor tweaks are needed.
The other thing that’s worth highlighting is that if you just want to visualize the data you can easily use PMTiles. It’s an incredible format that is completely cloud-native and gives the same web map tile response times you’d expect from a well-run tile server.
So I wanted to describe the core thing that makes this all work. It could probably use a snappy name, but my naming always tends towards the straight forward description. I suspect there’s many things to optimize, and ideally we get to an set of tools that makes it easy to create an Admin-partitioned GeoParquet distribution from any (large) dataset.
One main bit behind it is Parquet’s ‘row groups’, which are done at the level of each file, breaking it up into chunks that can report stats on the data within them. These can both be used in a ‘cloud-native’ pattern – querying the ‘table of contents’ of each file with an http range request to figure out what, if any, of the data is needed, enabling efficient streaming return of just the data collected. For many other formats that table of contents is in the header, for Parquet it’s actually the footer, but it works just as well. I’m far from the expert on this stuff, so if anyone has a good explanation of all this that I can link to let me know.
Building on the row groups, the other main innovation the ability of most tools to treat a set of Parquet files as a single logical dataset, using ‘hive partitioning’. I believe the core idea behind this was to have it work well on clusters of computers, where each file could fit in memory and you can do scale-out processing of the entire dataset. Generally the best way to split them files up is by the most commonly used data field(s). If you have a bunch of log data then date is an obvious one, so that if you just want to query the last week’s files then it’s a smaller number of files. The footer of each Parquet file easily reports the stats on the data that is contained in it.
With geospatial data the most common queries are usually by the spatial column – you care about where first, and then filter within it. So it makes a ton of sense to break files up spatially. And it also makes a ton of sense to organize the row groups spatially as well. After a decent bit of experimentation I think the pattern looks something like:
For global datasets I find it really nice to have the data split by country. It’s often what people are looking for, and you can just point them at the file and it all makes sense. And the cool thing is if you just split it up that way then the stats get all nice for any spatial query. It’s ok if it’s a query that spans countries, and if the countries overlap in complicated ways – querying a few extra files isn’t a huge overhead. But the spatial query is able to quickly determine that there’s lots of files that it doesn’t need to actually interrogate.
This is as yet a very ‘broad’ pattern, and I think there’s a lot of things to test out and optimize, and indeed there may be cool new indexing and partitioning schemes that are even better. I suspect that you could optimize purely on spatial position, like just break up files by quadkey, and it’d likely be faster. But my suspicion is that the gain from speed doesn’t outweigh the ‘legibility’, unless you weren’t exposing your core source files at all. There’s a few things I’ve explored, and a number of things to explore more – I’ll try to touch on most of them.
One big thing to note is that GeoParquet’s geometry definition in 1.0.0 is a binary blob, and thus it doesn’t actually show up in the stats that Parquet readers are using to smartly figure out what they should evaluate. It is a very ‘standard’ binary blob, using the OGC’s Well Known Binary definition, so it’s easy for any reader to parse. But it doesn’t actually enable any of the cloud-native ‘magic’ – the blob just sorta comes along for the ride.
The Overture Parquet distribution actually did something really interesting – they made a BBOX column, and defined it as a ‘struct’ with minx, miny, maxx, maxy defined as points within it.
Since these are in a native Parquet structure they each get stats generated on them, so it actually does enable the cloud-native ‘magic’. You should be able to use those plus row groups like a spatial index to query more efficiently. Just create a manual BBOX query in your WHERE clause (like bbox.minX > -122.5103 AND bbox.maxX < -122.4543 AND bbox.minY > 37.7658 AND bbox.maxY < 37.7715)
), and hopefully it’d do the bounds much faster and the more expensive intersect type comparison will happen when the bounds overlapped.
The core Parquet files from Overture weren’t structured at all to provide any advantages from this type of query, however. Each file contained buildings from the entire globe, and then the row groups were clearly not ordered spatially at all, see how they load:
110mb Overture Maps default partition, translated directly to GeoParquet w/ GPQ
Every building everywhere loads at once, meaning each row could be anywhere in the world, so you can’t ‘skip’ to just the relevant part. The experience of watching most geospatial formats load in QGIS is usually more like:
162 mb Overture Maps data, partitioned to Jamaica, data ordered on ‘quadkey’ column w/ DuckDB
This is because most all geospatial formats include a spatial index, ordering the files spatially, so when they load it appears in spatial chunks. The gif below is the same Overture data, in a file that was partitioned to just be Jamaica, and in the creation of the file with DuckDB I added a quadkey column and then my output command was:
COPY (SELECT * FROM buildings WHERE country_iso = ‘JM’ ORDER BY quadkey) TO ‘GY.parquet’ WITH (FORMAT PARQUET);
You actually could drop the quadkey column (SELECT * EXCLUDE quadkey FROM buildings…
) and the file would still load similarly. The manual bbox query should still be able to take advantage of the ordering, since it’s now grouped for the row group statistic optimizations to skip data.
I was really excited by having this bbox struct, and in my initial development of the client side tool I thought that I could just query by the struct. But unfortunately the results weren’t really performant enough – it took minutes to get a response. I did make a number of other improvements, trying a number of different things at once, so it’s definitely worth revisiting the idea. Please let me know results if anyone digs in more.
The breakthrough for performance seemed to be just querying directly on the quadkey. Where I landed was doing a client side calculation of the quadkey of whatever the GeoJSON input was. Then I could just issue a call like:
select id, level, height, numfloors, class, country\_iso, quadkey,
ST\_AsWKB(ST\_GeomFromWKB(geometry)) AS geometry from
read\_parquet('s3://us-west-2.opendata.source.coop/cholmes/overture/geoparquet-country-quad-hive/\*/\*.parquet', hive\_partitioning=1),
WHERE quadkey LIKE '03202333%' AND ST\_Within(ST\_GeomFromWKB(geometry),
ST\_GeomFromText('POLYGON ((-79.513869 22.451, -79.408082 22.451, -79.408082 22.573957, -79.513869 22.573957, -79.513869 22.451))')));
NOTE: one thing I do want to mention is that I believe all of these queries should work equally well with GDAL/OGR. I like DuckDB a lot recently, but OGR’s parquet support can query a partitioned dataset, and I believe using its sql functionality will query in the same way
So that worked really well. There are some places on the earth where the quadkey partitions don’t line up that well, like if the area is straddling the border of a really large quadkey. So I’m sure there’s some indexing schemes that could be better, I’m very, very far from being an expert in spatial indexing. Read on for a later section on indices.
The other cool development to mention is that the plan for GeoParquet 1.1 is to add a new geometry format that is a native, columnar ‘struct’ instead of an opaque WKB. This is all being defined in the GeoArrow spec (big congrats on recent 0.1 release!), since we want to be sure this struct is totally compatible with Arrow to enable lots of amazing stuff (half of which I still don’t understand but I’m still compelled by it and I trust the people who know more than me). This will work automatically with the stats of Parquet, and if it works well it may mean we may not need the extra index column for efficient cloud-native queries. But I am interested in getting to at least a best practice paper to help guide people who may want to add a spatial index as a column, as it likely still has uses and it’d be nice to have recommendations for best practices (like what to name the column), so clients can know how to request the right info.
The other thing I experimented a good bit with was ‘Hive partitions’. These are actually really simple, and very flexible. It’s just a convention for how to encode key information about a folder into the name of the folder.
As you can see it’s a simple scheme, just call the directory
My main experiments were to use ‘country_iso’ as the column to organize the hive partition on. In my first Google Buildings experiment I split up every country in the folder by admin level 1 (state / province), and was pleasantly surprised that all the hive stuff still worked.
Egypt Partition, with each file split into the admin level 1 entity
Most of the examples and tools with hive do all the ordering and partitioning automatically, and often just have a number of files named data_1.parquet
, data_2.parquet
. But it was totally fine for me to split them as I liked and name them California.parquet
and British_Columbia.parquet
and have everything still work the same. If that were not the case then it’d be a lot more difficult to use those hive partitions with traditional geospatial workflows. It does introduce a slightly weird syntax, having folders named ‘country_iso=CA’ (could make it friendly with country=Canada, but still looks a bit odd), but if there’s real performance gains then it’s worth.
I did start to experiment with those gains, and it seemed to be that if you actually do use that column in the query there’s a pretty big benefit. This needs a lot more testing, but in the main test I did performance went from 30 seconds to under 3 seconds. The queries above modified to make use of my hive partition just adds country_iso = 'CU'
at the start of the WHERE clause.
select id, level, height, numfloors, class, country\_iso, quadkey, ST\_AsWKB(ST\_GeomFromWKB(geometry)) AS geometry from read\_parquet('s3://us-west-2.opendata.source.coop/cholmes/overture/geoparquet-country-quad-hive/\*/\*.parquet', hive\_partitioning=1),
WHERE country\_iso = 'CU'
AND quadkey LIKE '03202333%' AND
ST\_Within(ST\_GeomFromWKB(geometry), ST\_GeomFromText('POLYGON ((-79.513869 22.451, -79.408082 22.451, -79.408082 22.573957, -79.513869 22.573957, -79.513869 22.451))'))
It does make a ton of intuitive sense why this is faster – without that it needs to issue an http request for the footer of every single file to be sure there’s nothing it needs to interrogate further. So if I know that the query is definitely in Canada then it only has to query the files in that folder.
I did seem to find that if you’re not using the hive partition then the overall queries seem a bit slower. Which also makes sense – without Hive partitions I’d just put all the files in one directory, so with Hive as it’s got to touch S3 for each directory and then each file, instead of just the files.
The way my CLI works right now is that a user would need to supply the country_iso
as an option in order to get it included in the query. But it does feel like there could be a way to automatically figure out what the proper country_iso(s) would be. Right now the quadkey gets calculated on the client side, so I wonder if we could just preprocess a simple look-up file that would give the list of country_iso codes that are contained in a given quadkey. No geometries would be needed, just quadkey id to country iso. It seems like it should work even if a quadkey covers multiple countries – it’d still help to use the hive partition. I think this should be a fun one to make, if you want to take a shot just let me know in this ticket.
One of the most important pieces to make this work well is great administrative boundaries. Ideally ones that fully capture any area where there’d be a potential building (or other object if we go beyond buldings). And they need to be true open data with a liberaly license. When I started on Google Buildings I used GeoBoundaries CGAZ which worked pretty well. The one issue is that their coastlines are close in and not all that accurate, so it missed a lot of buildings on the coast and on islands.
Doing the ‘nearest’ boundary works pretty well – I used PostGIS, and VIDA used BigQuery in the Google + Microsoft dataset. It feels like it wouldn’t be 100% accurate, so ideally we’d get to a high quality admin boundary that goes all the way to their ‘exclusive economic zones’ and has area for open ocean. I did make an attempt to buffer the CGAZ boundaries, but I haven’t tried it out yet. It’s still not likely to be 100% accurate in all cases – you could see an island off the coast of one state that actually belongs to another one. You can try it out… as soon as I find the time to upload it and write a readme on source cooperative. But just ping me if interested.
I also tried out the Overture admin boundaries. Admin level 0 was quite easy, as there was an example provided in the docs. And it worked much better than CGAZ with the coastlines – it clearly buffers a lot more:
Red is Overture, Green is CGAZ (and slipped in my CGAZ buffer in grey)
I’ve been partial to DuckDB lately, which does not yet support nearest (and probably needs a spatial index first before nearest is reasonable), and Overture managed to match over 99.9% (check this) of the data.
Admin level 1 was more challenging with Overture. I believe most of the world calls states and provinces ‘level 1’, but open street map & overture call it level 4. It was a pretty obscure DuckDB query, but I got results as GeoParquet (I’ll also try to upload this one soon). Unfortunately they’re basically useless for this purpose as they just have a number of states that are totally missing (at least in the July release, haven’t checked October). Maybe they’re in there somewhere and I’m doing it wrong – happy for anyone to correct me and show how to get all the states & provinces. So with Overture I just did admin 0 and then broke the data by quadkey if it was too big. I kept this for my Google Buildings v3, since I had scripts that were very easily adapted and worked with DuckDB on my laptop. But I think there’s still a lot more experimentation to do here.
One thing I did find is you can’t do ‘unbalanced’ hive partitions. It seemed like it could be ideal is to just have country files at the root level and only use the hive partition if the country needs to be broken up. This could be really cool, since you could do that two levels down – like California is often huge and so California could have a folder with its counties available for download, but smaller states would just be a single file. Unfortunately you need to put each file in its own folder for the hive partition to work its magic.
I hope to refine a ‘best practice’ more, to balance the ‘traditional’ GIS workflow of a folder structure that’s intuitive to navigate but also cloud-native performant. The things on the list include:
I’d welcome anyone experimenting with this more. And I think the big thing to do is figure out which of the administrative boundary options is best for this use case. I got a recommendation for Who’s on First, and am open to other options. I think the key thing is to be sure all potential buildings on islands and coastlines are captured.
So what is ‘too big’, so that the file should be broken up, in my previous section? I’ve been using approximately 2 gigabytes as the upper size to cut off and break files into smaller parts. This mostly came because I’m working on a desktop computer and with pandas in particular anything with more than 2 gigabytes or so of data would take up too much memory. And other programs would also perform less well. I think I’ve also heard that in the general scale out data world it’s also a good to keep data smaller than that, so each section completely fits in memory when you process it with a cluster.
I’ve been using a good bit of Pandas recently since it was the best to set the row group size on data like Overture with nested Parquet data structures. I did attempt to use GDAL/OGR for processing data, since it does let you set row group size and it seems like it should work, but I couldn’t get my OGR code to load the geometry column if the data was not compliant GeoParquet (DuckDB doesn’t yet write out GeoParquet, so I needed a post-process step) any help is appreciated (PR’s welcome ;)). GPQ has been my go to tool, as it does great with huge files and memory management, but it did not yet support control over row group size. But now it does!
So I’m sure I’ll just use GPQ in the future, as it’ll handle much bigger Parquet conversion with ease.
I am curious about the performance of querying large files in a cloud native way, but it does feel a bit more ‘usable’ for a traditional GIS workflow if you’re not having to download a 10 gigabyte file for all of Brazil. So I lean a bit toward breaking them up. VIDA put up both approaches in their Microsoft + Google dataset – /geoparquet/by_country/ has one file per country, with a 12.7 gig US file, and /geoparquet/by_country_s2/ splits by S2. I’d love to see some performance testing between the two (and to get both options added to my CLI) But I think I lean towards sticking to a ~2 gig max per file.
Ok, so what is this row group size? As I mentioned above row groups are these chunks of rows that provide ‘stats’ on the data within them, so the row group size is how many rows are in each chunk. They essentially serve as a rough ‘index’ for any query. And it turns out setting different sizes can make a pretty large difference when you’re trying to query against a number of files. I was using GPQ which set very large row group sizes in early versions. When I switched to Pandas and used a row group size of 10,000 then I got substantially better performance when doing a spatial query against the entire dataset.
I’ve experimenting with different sizes, but I definitely feel like I’m shooting in the dark. I read somewhere that if you have too many row groups per file and you query across a bunch of files then it can slow things down. I hope that someone can experiment with creating a few different versions of one of these larger datasets so we can start to gather recommendations as to what row group size to set, and how to reason about it relative to number of files. Though just last night at Foss4g-NA Eugene Cheipesh (who shared insights from going deep with GeoParquet indexes more than a year ago) gave some hints. He said to think of row groups like a ‘page’ in PostgreSQL (I didn’t really know what that means…), and to aim for 200kb – 500kb for each. The python and go Parquet tools just let you set number of rows, so it’ll take a bit of math and approximation, but should be possible.
I made it so my command line program that creates GeoParquet from a DuckDB database lets you set both the maximum number of rows per file and the row group size. If there’s more than the max rows per file for a given country then it just starts to split it up by quadkey until it finds one that is less than the max rows. I was pleased with my function that splits countries by the maximum number of rows, as it’s the second time in my entire programming career that I used recursion. I would love to have an option that splits it by admin level 1, for a bit more legibility, but I didn’t get there yet.
I also attempted to get it to try to find the smallest possible quadkey, as occasionally the algorithm would select a large quadkey that could likely be slimmed down without splitting into more quadkeys. I am also curious if a pure partition by quadkey would perform substantially better than the admin 0 partition. I’m quite partial to the admin 0 partition for that traditional GIS download workflow, but if the performance is way better and we have a large ecosystem of tools that understands it then it could be an option. And anyone that’s just using GeoParquet to power an application without exposing the underlying partition may prefer it.
If you have some GeoParquet and are curious about the row group size, the main way I found was to use this DuckDB query:
SELECT * FROM parquet_metadata(‘Jamaica.parquet’);
Each row will have a row_group_num_rows
column (you can also just select that column, but the other info is interesting), which is the row group size. If you want the number of row groups use SELECT row_group_id FROM parquet_metadata(‘Jamaica.parquet’) order by row_group_id desc limit 1;
and add 1 (0 is the first id)
GPQ’s ‘describe’ also now includes the total number of rows and the number of row groups:
There may be other ways as well, but those both worked for me.
The other area that would be great to see more experimentation in is with is spatial indices. I picked quadkeys, and the choice was fairly random. And I wouldn’t say it works perfectly, as some of my sample GeoJSON queries would span borders of ‘big’ quadkeys, so the quadkey that would fully contain my query would be a level 3 quadkey, like 031, which covers a large chunk of the earth.
So obviously the query would have to perform the spatial filter against a much larger set of candidate data. Perhaps other indexing schemes wouldn’t have that issue? And actually as I write this it occurs to me that I could attempt to have the client side figure out if a couple smaller quadkeys could be used, instead of just increasing the size of the quadkey until the area is found. But I suspect that other indices might handle this better.
I would love to see some benchmarking of these single column spatial indices, and also any other approaches towards indexing the spatial data. And I’m open to people pointing out that my approach is sub-optimal and proposing new ways to do it. The thing that has been interesting in the process is how you can get increased spatial performance by just leveraging the existing tools and ecosystem. I’m sure a DuckDB-native spatial index would perform better than what I did, but I was able to get a big performance boost on remote querying spatial data just by adding a new column and including it in my query. It’s awesome that the whole ecosystem is at the point of maturity where there are a number of different options to optimize spatial data.
I also understand that this scheme isn’t going to work perfectly for all spatial data. Big long coastlines I’m pretty sure will break it, or huge polygons. But many of those I don’t think need to be partitioned into separate files – it’s fine to just have one solid GeoParquet file for all the data. I see this as a set of recommendations on how to break up large datasets, but those recommendations should also make clear when to not use it.
I’ve not yet had the time to really dig into ‘table formats’, but I’ve learned enough to have a very strong hunch that they’re the next step. The idea is that with some additional information a bunch of Parquet files on the cloud can act like a table in database – giving better performance, ACID transactions, schema evolution and versioning. There’s a few of these formats, but Apache Iceberg is the one that is gaining momentum. This article gives a pretty good overview of table formats and Iceberg. And Wherobots just released the Havasu table format, which builds on Iceberg and adds spatial capabilities, using GeoParquet under the hood. I’m hoping to be able to add a table format to the ‘admin-partitioned geoparquet distribution’ to help even more tools interact with it. I’ll write a blog post once we get some results – help on this is appreciated!
Ok, so this is now a long ass blog post, so I’m going to work to bring it to a close. I think there’s a number of potentially profound implications for the way we share spatial data. I just gave a talk at Foss4g-NA that explored some of those implications. The talk was recorded, so hopefully gets published soon. But if you’ve made it all the way through this blog post you can peruse the slides ;) Lots of pictures and gifs, but the speaker notes have all that I said. Since it’s a talk I had to keep things quite high level, but I’m hoping to find time in the next few months to go a lot deeper in blog posts.
There are clearly lots of open questions throughout this post. My hope is that we could get to a recommended way for taking large sets of vector data and putting it on the cloud to support traditional ‘download’ workflows, cloud-native querying, and everything in between. I’m pretty sure it’ll be partitioned GeoParquet + PMTiles + STAC, plus a table format. And we should have a nice set of tools that makes it easy to translate any big data into that structure, and then also lots of different tools that make use of it in an optimized way.
I’d love any help and collaboration to push these ideas and tools forward. I’ve not really been part of a proper open source project in the last ten years or so, but we’ve got a small group coalescing around this open-buildings github repo where I started hacking on stuff, and then also on the Cloud-Native Geo Slack (join with this link, and #admin-partitioned-big-data is the channel centered on this). Shout out to Darren Weins, Felix Schott, Matt Travis, and Youssef Harby for jumping in early. I think one cool thing is that the effort is not just code – it’s evolving the data sets on Source Cooperative and putting more up, enhancing code, and then I think as it starts to coalesce we may make some sort of standard. I’m not sure exactly how to organize it – for now the Open Buildings issue tracker just has a scope beyond code. It’ll likely make sense to expand from this that repo, indeed we’re already doing more data than just buildings.
But please join us! Even if you’re not an amazing coder – I’m certainly not, as it’s been 15+ years since I coded seriously, but ChatGPT & Co-pilot have made it possible. New developers are welcome, and there will also be ways to help if you don’t want to code.
TL;DR: We’re excited to introduce the Cloud-Optimized Geospatial Formats Guide to help you navigate the ever-expanding universe of cloud-native geospatial technologies. Many thanks to NASA’s Interagency Advanced Concepts and Implementation Team (IMPACT) and Development Seed for their leadership in creating this guide and opening up to the community.
Last week, NASA announced that NASA’s Level-1 and Atmosphere Archive and Distribution System Distributed Active Archive Center (LAADS DAAC) has moved all of its storage to the cloud. According to a lead on the project, this represents 5.7 petabytes of data across 73 million data files. What’s more, the project was completed almost a year ahead of schedule. This is just one anecdote of many that shows how quickly a lot of geospatial data is quickly moving to the cloud.
This rapid migration to the cloud has created a movement toward “cloud-native” geospatial applications that take advantage of the scalability and performance of cloud infrastructure. Fueling these applications are new cloud-optimized geospatial data formats that are scalable, accessible, and flexible enough to be used in a wide array of applications.
To help you navigate the expanding universe of cloud-optimized geospatial data formats, we’re excited to introduce the Cloud-Optimized Geospatial Formats Guide.
Led by NASA’s Interagency Advanced Concepts and Implementation Team (IMPACT) and Development Seed, the Cloud-Native Geospatial Foundation is hosting this community-powered guide to give newcomers and experts a single place to learn about best practices for working with data in the cloud. This guide is managed on GitHub and openly-licensed. We encourage community contributions to keep it up-to-date and valuable for all.
Cloud-Optimized Geospatial Formats.
Here is what to expect in the guide:
Advanced topics will explore visualizing various data types (e.g., Zarr, GeoParquet) in browsers and Jupyter notebooks, mastering HTTP range requests, assessing chunking and compression configurations, and benchmarking performance for different use cases, such as time series generation vs spatial aggregations. Join the conversation in the discussion board to connect with a community of like-minded individuals, share insights and seek help.
A sincere thank you to our dedicated authors, Aimee Barciauskas, Alex Mandel, Kyle Barron, and Zac Deziel. Thank you also to contributors of the Overview Slides: Vincent Sarago, Chris Holmes, Patrick Quinn, Matt Hanson, and Ryan Abernathey. Their expertise and commitment have been instrumental in bringing the guide to life.
At the end of September, the Spatio-Temporal Asset Catalog (STAC) community members gathered together in Philadelphia (and virtually) to improve STAC, grow the ecosystem of tools around STAC, and discuss other complementary cloud-native geospatial projects. This was the 8th STAC Sprint and the first in-person sprint since 2019. After three days of effort, we made some great strides across the board.
Everyone gathered the first morning of the sprint.
Based on the attendees of the sprint and their areas of expertise, we split up into four breakout groups. These groups were:
The following sections outline what each group accomplished during the three days of the sprint and the next steps for each topic.
A considerable benefit of the sprint was to get together in person and resolve some of the community’s longstanding online STAC discussions. Going into the sprint, there were around 10 issues with the label “discussion needed” in the stac-spec GitHub issues. During the sprint, each of these issues was discussed and a solution was determined. You can see the issues that will be completed as a part of the stac-spec 1.1.0 release in the 1.1 Milestone.
One of the most notable solutions made for the STAC Specification was the agreement on the Bands Requests for Comments Discussion. The agreed-upon solution is to create the new common metadata field bands
to replace eo:bands
and raster:bands
as well as to add the following fields to the common metadata: data_type
, nodata
, statistics
, and unit
. You can see the pull request for this change at radiantearth/stac-spec/pull/1254.
The STAC community will continue to work towards completing the issues in the 1.1 Milestone before releasing a v1.1.0. If you have any problems with the STAC Spec, it’s now the time to make your voice heard. Submit issues to the respective STAC repository with an in-depth explanation of your problem: stac-spec for STAC Specification issues and stac-extensions for issues with STAC Extensions.
As for the STAC API Specification, the group focused mainly on discussing several STAC API Extensions. Here are the discussed extensions and their new status:
Stable
:
For the STAC API Specification, the goals moving forward are to get a release of recent updates to CQL2 released for the Filter extension, advocate for updating in implementations (including a new implementation in stac-serve), and continue to engage with OGC team on the future of CQL2 (including separating functionality into conformance classes that we expect implementers will be able to support).
For the STAC ecosystem group, the majority of the work done was for PySTAC. A lot of work was done around extensions (notable work on the pointcloud extension for a v1.1) and continued development for v1.9.0.
Additionally, the group is in the process of developing a new, simpler interface for extensions and an extensions audit to ensure all versions are supported and tested.
As far as future efforts go, the stac-utils folks are following the bands Request for Comments (RFC) progress (decision discussed above) and have a work-in-progress Pull Request to add support to PySTAC when the RFC is accepted.
If you are interested in joining stac-utils virtual meetups, be sure to join the STAC Community Google Group to receive meeting invitations.
There is now a work-in-progress specification for STAC-GeoParquet which you can find at stac-utils/stac-geoparquet/pull/28. The current goal is to turn this into a more official specification with a more diverse set of datasets that meet the specification requirements and evolve the stac-goeparquet library to be a bit more generic.
I led a small group discussion around STAC outreach and education. The goal of this group was to identify how we can expand the STAC community to include a more diverse crowd.
During this sprint, we focused on developing tutorials that target non-Python users. A huge shoutout to Mike Mahoney for developing three stellar tutorials that are now on the STAC website. These tutorials expand our STAC education into R – Download data from a STAC API using R, rstac, and GDAL and How to Query STAC APIs using rstac and CQL2 – and Command Line Interface with the CLI data download tutorial. If you want to read more about cloud-native tools for non-Pyton users, check out Mike’s blog from a few weeks back: “Cloud-Native Geospatial If You Don’t Speak Snake”.
A few more tutorials from the sprint are still in progress and will be added to the STAC site tutorials section soon including an improved STAC Extensions tutorial by Dimple Jain and a tutorial on creating a STAC Catalog via the command line by Mansi Shah.
For STAC documentation, a new STAC FAQ page was further developed (we first started building this at a STAC working session this fall) and will be added to the site in the coming months.
In addition to material creation, this group discussed important topics including barriers to entry into the STAC community and how to make STAC more accessible to newcomers. The Cloud-Native Geospatial Foundation will be holding a series of introduction webinars to STAC and cloud-native geospatial in the coming months. The first webinar is an introduction to STAC webinar for the Kenyan Space Agency (though it is free and open to the public) this Wednesday, October 18th at 9 a.m. EST. More information about this webinar can be found here.
In addition to collaborative work efforts, some attendees shared updates on the ways they are using STAC in their personal work. Each morning, we had a slot for 5-minute lightning talks or demos. It was great to see the variety of ways STAC is being harnessed in the work of the community. You can find the presentation slide deck here.
If you want to know more about a given lightning talk or specifics about any of this work done at the sprint, join the Cloud-Native Geospatial Foundation Slack organization and/or come to the STAC Community meetups (every other Monday at 11 a.m. EST) to meet many of the STAC Sprint participants and hear more about all the work they are doing related to STAC. You can join the STAC Community Google group to be added to the biweekly meeting and receive STAC-related emails: groups.google.com/g/stac-community.
In March, Jed described the “Naïve Origins of Cloud Optimized GeoTIFF” – an access pattern and ecosystem that revolutionized data delivery for AWS, its customers, and the public sector.
COG is an established technology for producing and consuming imagery, but there’s a missing half of geospatial: vector data. Organizations that work with imagery are wrangling vectors, too: consider building footprints, tasking areas, parcels, agricultural plots, and ML labels. COG’s accessibility advantages ought to carry over to vector workflows.
Bringing COG’s benefits to vector data is a design goal of PMTiles, a cloud-native format for visualization with tile pyramids. It’s meant to be a useful complement – not alternative – to COG, GeoParquet and FlatGeobuf.
The key differences between COG and Regular Old GeoTIFF are internal tiling and overviews.
This combination enables access via Range Requests on commodity object storage, making applications simple to build and deploy.
Another feature of COG is backwards compatibility. Adoption is effortless, since applications that read GeoTIFF, read COG. But backwards compatibility is also a limitation: COG works only for data represented by pixels.
Two emerging solutions for cloud-native vector are FlatGeobuf and GeoParquet.
Both FlatGeobuf and GeoParquet qualify as cloud-native vector formats. Neither format has internal tiling or overviews, which limits their usefulness for certain applications:
If a dataset consists of points, or uniformly-sized polygons like buildings, a spatial index is as good as internal tiling. But a dataset with irregularly sized features is more difficult to deal with. If a single feature contains 10,000 vertices – like the complex boundary of a protected wildlife area – any queries that touch this feature fetch the entire vertex sequence.
Spatial indexing applied to features of different sizes
Tiling a COG is obvious, since the pixels are gridded. But tiling of a vector dataset involves non-trivial clipping of polygons into bite-sized parts.
Clipping a vector shape into tiles
What about overviews? COG has an obvious strategy: each overview downsamples by 2, with well-known resampling.
Building raster overviews
What is an overview for vector? It’s essential for the astronaut’s eye view – a view of the whole dataset on a map. The solutions are ad-hoc: you could create duplicate, simplified versions of every feature and attributes specifying which zoom level to appear at. But that approach doesn’t solve which features to eliminate, since including every feature in a million-row dataset for zoom 0 is impossible.
Different strategies for building vector overviews - dropping vs. merging. Basemap © OpenStreetMap
FlatGeobuf and GeoParquet are analysis-focused formats. They’re useful for answering queries like What is the sum of attribute A over features that overlap this polygon? But their design does not enable cloud-native visualization like COG does.
Tiling and overviews of vector data is best accomplished with vector tiles. The de-facto standard, implemented by PostGIS and GDAL, is the open MVT specification by Mapbox – an SVG-like format using Protocol Buffers.
The best-in-class tool for creating vector tiles from datasets like FlatGeobuf and GeoParquet is tippecanoe, originally developed by Mapbox, but since v2.0 maintained by Felt. Tippecanoe doesn’t just slice features into tiles, it generates smart overviews for every zoom level matching a typical web mapping application. It adaptively simplifies and discards features, using many configuration options, to assemble a coherent overview of entire datasets with minimal tile size.
The last missing piece is a cloud-friendly organization of tiles enabling efficient spatial operations. This is the focus of my PMTiles project, an open specification for COG-like pyramids of tiled data, suited to planet-scale vector mapping. PMTiles, along with similar designs like TileBase and COMTiles, can be read directly by web browsers, meaning they work great as items referenced in SpatialTemporal Asset Catalogs.
Chris Holmes’ Google Open Buildings dataset on Source Cooperative contains GeoParquet files for different administrative regions.
Using Planet’s gpq command line tool to read the Cairo dataset, in concert with Tippecanoe:
gpq convert Cairo_Governorate.parquet --to=geojson | tippecanoe Cairo.geojson -o Cairo.pmtiles
The 105 MB GeoParquet input turns into a 54MB PMTiles archive, which can then be dropped directly into the PMTiles Viewer:
GeoParquet turned into PMTiles for visualization. Source: Google Open Buildings via Source Cooperative.
This 54MB archive can be stored on S3 and enables simple deployment of interactive visualizations – a useful complement to analysis-focused vector formats.
PMTiles is the foundation of the Protomaps open source project – an ecosystem of tools, libraries and data for geospatial visualization. Protomaps publishes its datasets to Source Cooperative, and helps organizations of all sizes transition to cloud-native mapping. It’s supported through commercial development projects, a fellowship through the Cloud-Native Geospatial Foundation, and GitHub Sponsors. You can learn more at protomaps.com.
Introducing the 'Murena 2'
https://murena.com/shop/smartphones/murena-2/?wcpbc-manual-country=GB
And in case your wondering...Yes I do use a Murena based Samsung Galaxy as my daily driver!
One of the things I’m most proud of about GeoParquet 1.0.0 is how robust the ecosystem already is. For the 1.0.0 announcement, I started to write up all the cool libraries and tools supporting GeoParquet, and the awesome data providers who are already providing interesting data in the spec. But I realized it would at least double the length of the blog post to do it justice, so I decided to save it for its own post.
The core of a great ecosystem is always the libraries, as most tools shouldn’t be writing their own special ways to talk to different formats – they should be able to leverage a well-optimized library so they can spend their time on other things. The first tool to support GeoParquet, before it was even called GeoParquet, was the great GeoPandas library. They added methods to read and write Parquet, and then Joris Van den Bossche and Dewey Dunnington started the process of standardizing the geospatial parquet format as part of GeoArrow as they explored cross-library reading and writing. Joris wrote the GeoPandas Parquet methods, and Dewey worked in R, working on the GeoArrow R library (there’s also another R library called sfarrow). Their initial work became the core of the GeoParquet spec, with a big thanks to Tom Augspurger for joining their community with the OGC-centered one where I started. We decided to use the OGC repo that I was working in for the GeoParquet spec, and to keep the GeoArrow spec repository focused on defining Geospatial interoperability in Arrow. Arrow and Parquet are intimately linked, and defining GeoParquet was much more straightforward, so we all agreed to focus on that first.
The next two libraries to come were sponsored by Planet. Even Rouault added drivers for GeoParquet and GeoArrow to GDAL/OGR, and while he was at it created the Column-Oriented Read API for Vector Layers. The new API enables OGR to ‘retrieve batches of features with a column-oriented memory layout’, which greatly speeds things up for reading any columnar format. And Tim Schaub started GPQ, a Go library for working with GeoParquet, initially enabling conversion of GeoJSON to and from GeoParquet, and providing validation and description. He also packaged it up into a webapp with WASM, enabling online conversion to and from GeoParquet & GeoJSON. Kyle Barron also started working extensively in Rust, using it with WASM as well for browser-based tooling. Bert Temme also added a DotNet library. For Python users who are working in row-based paradigms, there is also support in Fiona, which is great to see. And there’s a GeoParquet.jl for those working in Julia.
And the most recent library to support GeoParquet is a pure javascript one in loaders.gl, thanks to Ib Green. It will likely not support the full Parquet spec as well as using something like GPQ in WASM, since there is not a complete Javascript Parquet library. But it will be the most portable and easy to distribute option, so is a great addition.
The one library I’m now still hoping for is a Java one. I’m hoping someone in the GeoTools & GeoServer community takes up the call and writes a plugin, as that one shouldn’t be too difficult. And of course, we’d love libraries in every single language, so if you have a new favorite language and want what is likely a fairly straightforward programming project that will likely see some widespread open source use, then go for it! There are pretty good Parquet readers in many languages, and GeoParquet is a pretty minimal spec.
This is a much wider category, and I’m sure I’ll miss some. I’m loosely defining it as something a user interacts with directly, instead of just using a programming language (yes, an imperfect definition, but if you want to define it you can write your own blog post ;) ).
I count command-line interfaces as tools, which means I’ll need to repeat two of the libraries above:
It’s super easy to install on any machine, as Go has a really great distribution story, and is the main tool I use to do validation and when I’m working with Parquet data that’s not GeoParquet. The most recent release also does a really nice job description of your GeoParquet:
For traditional GIS workflows, QGIS uses GDAL/OGR, so can easily read GeoParquet if the right version of GDAL/OGR is there.
For Windows this works out of the box using the main QGIS installer (OSGeo4W). Unfortunately the Mac OS/X installer doesn’t yet include the right dependencies, but if you’re on a Mac you can install QGIS with conda. Just install conda and then run conda install qgis libgdal-arrow-parquet, then start QGIS from the terminal (just type ‘qgis’). This is a bit annoying, but it’s currently how I always use QGIS, and isn’t bad at all. For Linux Conda works well, just install QGIS with the same command to be sure the right GDAL libraries for GeoParquet are there. The main QGIS installer might also work (hopefully someone can confirm this).
Safe Software’s FME also added GeoParquet support in version 2023.1. It’s probably the best tool on the market for transforming data, so it’s great to see it supported there.
There’s a ton of potential for GeoParquet on the web, and it’s been great to see many of the coolest web tools embracing it. The first to support GeoParquet was Scribble Maps. You can easily drag a GeoParquet on it and everything ‘just works’, and you can also export any layer as GeoParquet.
It’s a great tool that’s been adding tons of great functionality recently, so definitely check it out. They use a WASM distribution of GPQ under the hood, showing off how capable the browser is becoming.
You can also make use of a web-based GeoJSON converter, available on geoparquet.org, that’s also powered by GPQ.
CARTO, my long-time favorite web-based analysis platform, added GeoParquet support recently. They’ve been one of the biggest supportors of GeoParquet, with Alberto Asuero helping with core spec work and Javier de la Torre leading lots of evangelizing of GeoParquet. You can easily import GeoParquet, and they’re starting to support export as well.
There’s also been a lot of awesome experimental work that’s not quite ‘tools’ yet, but shows off the potential of GeoParquet on the web. Kyle Barron has been doing a lot with Rust and WASM, really pushing the edge of what’s possible in the browser. Not all his work is actually not using GeoParquet directly, but showing just how far the browser can go in displaying massive amounts of geospatial data. Under the hood he uses a lot of Arrow, which is closely related to Parquet, and GeoParquet will be the preferred way to get data into his tools. You can check out his awesome observable notebooks and learn a ton about the bleeding edge of geospatial data in the browser:
There’s also cool experiments like Overture Maps Downloader, which uses DuckDB in Web Assembly to interact with Overture Map GeoParquet files up on source.coop. You can check out the source code on github.
Also any web tool that uses loaders.gl will likely soon support Geoparquet as they update to the latest release. Chief among those are deck.gl and cesium, so hopefully we can fully add both to the list soon.
There’s also a couple nice little tools to help with specific conversions of STAC. The BigQuery converter was built by the Carto team, to help get valid GeoParquet out of BigQuery. Our hope is that BigQuery supports GeoParquet directly before too long (if you’re a BigQuery user please tell your Google account manager you’re interested in supporting it, or let me know and I can connect you with the PM), but in the meantime it’s a great little tool. There’s also a tool to convert any STAC catalog to GeoParquet, called stac-geoparquet. It powers Planetary Computer’s GeoParquet versions of their STAC Collections, and is evolving to be less tied to Azure.
A few of the above things (FME, Carto) could reasonbly be called frameworks, and I’m using this category as a bit of a catchall. Apache Sedona was an early adopter and promoter of GeoParquet, and you can do incredible big data processing of any spatial data with it. Wherobots is a company formed by core contributors to Sedona, and they’re working on some exciting stuff on top of it, and have been great supportors of the ecosystem. Esri also has support for GeoParquet in their ArcGIS GeoAnalytics Engine, which lets Esri users tap into big data compute engines. Seer AI is also a new, powerful platform, and they’re using GeoParquet at the core, and it’s easy to import and export data. The other tool worth mentioning is DuckDB. I recently wrote up my excitement for it. It doesn’t actually yet directly support GeoParquet, but their Parquet and Geo support are both so good that it’s quite easy to do. I’m optimistic they’ll support GeoParquet for real soon.
I had some ambitious goals to get a ton of data into GeoParquet before 1.0.0, that we didn’t quite meet. But in the few weeks past 1.0 it’s been accelerating quickly, and I think we’re just about there.
One goal was ‘data providers’, the number of different organizations providing at least some sort of data in GeoParquet. Microsoft has been incredible on this, producing GeoParquet versions of all their STAC catalogs, and also really showing off the power of partitioned GeoParquet with their building footprints dataset.
The second producer to put data up as GeoParquet was Planet, converting the RapidAI4EO STAC dataset into GeoParquet, and Maxar also converted their Open Data STAC to GeoParquet.
But it’s not just all STAC! I’m most interested in more non-STAC datasets, and they are starting to emerge. Carto has started making their Data Observatory available in GeoParquet, and there are tons of great datasets (demographics, environmental, points of interest, human mobility, financial, and more) there. Not all are fully available yet, but if you’re a Data Observatory user and want particular datasets as Geoparquet they can likely prioritize it.
The Ordnance Survey, UK’s national mapping agency, has also put their National Geographic Database – Boundaries Collection in GeoParquet, with a number of great data layers. And one of the datasets I’ve been most excited about in general was released as GeoParquet by VIDA, a cool company in the Netherlands working on climate risk. They combined the Google Open Buildings dataset with the Microsoft Building Footprint open data into a single dataset, it’s available on Source and you can read about it in their blog post.
It’s got over 2.5 billion rows in almost 200 gigabytes of data, partitioned into individual country GeoParquet files. There’s also a nice version of the ESA World Cereals dataset created by Streambatch, they wrote up a great tutorial on converting it to GeoParquet, and the final data file is available on Source Cooperative.
One thing you’ll notice if you follow many of these links is that most are hosted on Source Cooperative, a new initiative from Radiant Earth. You can find lots of cloud-native geospatial data there, and it’s emerging as a central location for open data. I’ve also just been converting data into GeoParquet and hosting it there, you can get the Google Building Footprints, the Overture data as GeoParquet (it was released as Parquet but not GeoParquet), EuroCrops and the NYC Taxi Zones from TLC Trip Record Data. I’m hoping to work on converting many more datasets to GeoParquet, and if anyone else wants to join me in that mission I can get you an invite to Source Cooperative if you’d like a place to host it.
And a last minute addition as I’m almost done writing this post, Pacific Spatial Solutions translated a ton of Tokyo building and risk data into GeoParquet, you can find it in the list of data in their github repo.
So I think we’ve made a really great start on a robust ecosystem. But our goal should be to go much, much further. I believe GeoParquet is fundamentally a better format than the other geospatial formats. But just having great properties as a format doesn’t necessarily mean it’ll be widely adopted. The goal for the next couple of years should be to have GeoParquet be ubiquitous. Every piece of software that understands geospatial data should support the standard. And ideally, it’s the preferred geospatial format of tools that are just starting to support geospatial.
But I believe the real way to measure its success is by the amount of data you can access in it. We can get the ball rolling by putting tons of valuable data up on Source Cooperative. But to really build momentum we need to evangelize to all data providers, especially governments, to provide their data in this format. So if you’re a provider of data please try to make your data available as GeoParquet, and advocate to your software providers to support it. If you’re a developer of geospatial software please consider supporting the format. And if you’re a data user then advocate to your software and data providers, and start saving your data in GeoParquet.
If you want to help out in building the ecosystem but are having trouble prioritizing it, we’ve got money to help. The OGC will put out a Call for Proposals very soon, asking for submissions to enhance the GeoParquet ecosystem. So if you’ve got a library you want to build, a piece of software you want to enhance to read and write GeoParquet, or some data you want to convert and make widely available as GeoParquet then keep an eye out for it and apply.
There are also a lot of really exciting things coming up for the evolution of GeoParquet, that I think will take it from a format that’s better in many little ways to enabling some things that just aren’t possible with today’s geospatial formats. But that’s a topic for future blog posts. For now, I just want to thank everyone who’s been early in support of GeoParquet — it’s been a great community to collaborate with, and it feels like we’re on the verge of something really exciting. And if you’d like to join the growing community, we just started a #geoparquet channel on the Cloud Native-Geospatial Foundation slack (click the link to join). And if you’d like to join our community meetings (17:00 UTC time, every other week) then sign up for this google group and you should be added to the calendar (we have no plans to actually use the forum — use slack instead).
Now, at VIDA, we’ve taken it a step further. We’re excited to share that by merging the Google and Microsoft datasets, we’ve created the most comprehensive, freely available, global, cloud-native building footprint dataset available today. It’s hosted on the Source Cooperative and accessible on map.vida.place/explore.
The dataset is freely available for download on Source Cooperative.
At VIDA, we rely heavily on BigQuery for handling large-scale geospatial tasks. Given its robust features and efficient processing capabilities, it is our platform of choice.
The VIDA platform is used in more than 20 countries to plan and de-risk sustainable infrastructure investment projects. Amongst commercial building footprint datasets, we routinely use the openly available Google as well as Microsoft building footprint datasets. We have found that for different areas, one or the other dataset was superior.
In order to get the best from both datasets, our goal was to create a unified dataset we could use globally. The challenge was to ensure that overlapping footprints from Microsoft’s dataset, which were already present in Google’s dataset, were excluded.
In geospatial analytics, handling vast amounts of data efficiently is crucial. To achieve this, we employed techniques like spatial clustering and partitioning. Our partitioning was based on administrative level 0 boundaries using the CGAZ dataset. This decision was influenced by the widespread availability of deep-level administrative boundaries and the distribution pattern of building footprints. Read our blog post for a more detailed overview on our merging approach.
Our merging endeavor yielded some truly impressive statistics:
While BigQuery offers multiple export formats, the absence of direct GeoParquet support posed a challenge. However, the recent release of GeoParquet version 1 brings hope for future integrations. To overcome this, we let DuckDB take over the heavy lifting once the data was out of BigQuery and into our GCS buckets. DuckDB’s awesome httpfs extension implements a file system that allows reading remote/writing remote files. Although we needed a bit of workaround, mocking GCS as an S3 URI, the integration worked seamlessly. Using DuckDB we merged and exported the Parquet files based on our partitioning schemes and subsequently utilized gpq and ogr2ogr to craft GeoParquet and FlatgeoBuf files. The final touch was the creation of PMTiles archives, both at the country and global levels.
While our initial partitioning based on level 1 administrative boundaries was effective and straight-forward, we noticed that some Parquet files were excessively large, affecting performance. Our solution? A combination of administrative level 1 partitioning and further splitting based on the S2 grid. This dual approach, inspired by the open-buildings tool from Chris Holmes, ensures optimal performance. Using BigQuery’s native S2_CELLIDFROMPOINT function, we were able to assign each building footprint to a S2 grid ID, making sure no grid exceeded more than 20 million buildings.
Our integrated dataset is now accessible on Source Cooperative in various formats, including FlatGeoBuf, GeoParquet, and PMTiles. Go check it out at https://beta.source.coop/vida/google-microsoft-open-buildings. We’ve employed a mix of partitioning strategies, focusing on administrative level 0 and a combination of level 0 with S2 grids. You can also view the dataset directly at map.vida.place/explore, where we use a small serverless middleware to translate the PMTiles archive to a technology agnostic XYZ tile URL. Sign-up is free!
As we continue our work, we plan to refine our partitioning techniques, with a focus on integrating both administrative levels 0 and 1. This approach has already shown promising results in certain test regions. We invite the tech community to explore our dataset and join us in pushing the boundaries of geospatial data analysis!
The Python ecosystem for open-source cloud-native geospatial tooling is fantastic. Projects like fsspec make it easy to work with cloud storage as if it were local, dask enables scaling computation from a single node to an entire server farm, libraries like zarr, kerchunk, and rasterio make it easy to read and write spatial data, which can then be analyzed with projects pandas and xarray.
But – and this might surprise a number of Python users! – some people don’t know Python, and some prefer to use other tools. Are those users simply out of luck?1
At the 2023 ESIP July Meeting, Alexey Shiklomanov organized an UnConference session around this very topic. We came away with quite a list of success stories of people using non-Python tools for cloud-native geospatial workflows, along with a few pain points that still need to be addressed to unlock the potential of other langauges for cloud-native geospatial workflows. Here’s a few of each.
Alexey Shiklomanov leading the Unconference session at the 2023 ESIP July Meeting.
Perhaps the biggest surprise from the session is that there are many non-Python tools available for cloud-native geospatial workflows.
The biggest open secret in spatial open source development is that huge chunks of most projects are fundamentally wrappers around three of the oldest spatial open-source libraries, namely GDAL, PROJ, and GEOS.
GDAL describes itself as a “translator library for raster and vector geospatial data”, making it easy to read and write spatial data in almost any format. GDAL’s virtual file system functionality is particularly useful for cloud-native workflows, letting users read data over network connections as if it were a local file.
PROJ meanwhile is a “generic coordinate transformation software”, providing standard translations between coordinate reference systems for cartographic and geodetic data. PROJ can be used to reproject spatial data directly, or indirectly via GDAL’s warping functionality.
Last but not least, GEOS is a library for “computational geometry with a focus on algorithms used in geographic information systems (GIS) software”, providing efficient algorithms for common spatial predicates and operations.
These core tools are highly optimized for reading, reprojecting, manipulating, and writing geospatial data, and work well in serverless environments and with data stored in cloud storage. Both PROJ and GDAL have useful command line interfaces, meaning they can be used directly in just about any cloud-native geospatial workflow without requiring a specific language runtime. It’s possible that most users of these libraries will never actually interact with the CLI, however; because GDAL, PROJ, and GEOS all provide C/C++ APIs, they can be wrapped into interfaces that allow users to take advantages of these libraries from other languages. For instance, GDAL has interfaces for Rust and Java and Python and more, allowing users in those languages to access these lower-level tools directly and build wrappers on top of their functionality.
In addition to these core spatial libraries, there are a handful of language-agnostic tools that can help users analyze their data’s spatial and non-spatial attributes, similar to geopandas or xarray. Chief among these is PostGIS, a spatial extension to the venerable PostgreSQL database system that allows executing SQL queries against spatial databases from any language that supports ODBC interfaces. PostGIS can make it easy for users to access and analyze remote data sources by accepting queries and returning results over a network connection, making it useful for cloud-native workflows. However, as PostGIS uses a client/server architecture and requires setting up and administering a PostgreSQL server, it can often be too complex for users working on one-off analyses.
DuckDB’s Spatial extension is a very exciting new alternative to PostGIS, providing a spatial query engine without requiring a separate server process to execute queries. This extension uses GDAL to read spatial data in as tables, meaning your local DuckDB process can access files in cloud storage using GDAL’s virtual file system functionality, and is available either via the command line or via a huge number of language interfaces. While the extension still self-describes as a prototype, this query engine is a promising addition to the suite of language-agnostic tooling available for geospatial workflows.
This is nowhere near an exhaustive list of language-agnostic geospatial tools available. There exist many more low-level libraries with interfaces to multiple languages, such as the s2geometry tool for geometric computations in geographic coordinate reference systems, or exactextract for fast zonal statistics. With more tools being released all the time, the future remains bright for language agnostic cloud-native geospatial workflows.
That said, there are still challenges left in non-Python cloud-native geospatial workflows. I’m going to highlight two in particular: many useful tools don’t have low-level interfaces other languages can take advantage of, and many educational materials focus exclusively on the Python ecosystem.
First off, while a lot of useful tools do exist as low-level libraries that can be wrapped by other languages, a large number of tools in the Python spatial ecosystem don’t have any corresponding low-level utility that can be accessed without a Python runtime.2 There aren’t C/C++/Rust equivalents for dask, zarr, or xarray, and so workflows or tutorials that are written with these tools in mind are currently inaccessible to other languages.
If projects structure themselves by writing high-level wrappers around standalone low-level libraries, it’s relatively easy3 to add additional language interfaces to these tools as needed. But this is a lot of work, particularly given that most open-source projects start out as prototypes in higher-level languages with essential components moved to lower-level code over time as performance constraints demand. As such, many useful tools are currently locked-in to a single language ecosystem.
Secondly, there’s a dearth of educational materials for non-Python users looking to tackle geospatial analysis projects. This is starting to change; in particular, there are many introductory books on using R for spatial analysis and data science, with more released every month. However, many specialized topics still only have official tutorials for Python users, and it would be useful for there to be more resources on using alternative tools for advanced spatial workflows.
I’ve focused so far on language-agnostic tooling that have interfaces to multiple languages or the command line, as these tools are broadly useful no matter what language a user prefers working with. It’s worth spending some time looking at specifically the R ecosystem, however, as likely the next-largest language for geospatial workflows. R has a huge number of spatial libraries available,4 making it easy to download and wrangle spatial data. That said, there are some gaps in what R has available, and some challenges with what R does provide that makes a cloud-native workflow harder.
As I wrote last year, I absolutely love the core tooling available for working with spatial data in R. The sf package wraps GDAL, PROJ, and GEOS to provide a flexible and expressive vector toolkit, making it easy to read, manipulate, and write vector data from R and integrating naturally with dplyr and the rest of the R ecosystem for analysis workflows. The terra package provides a similar service for raster data, giving R users access to a highly-performant, fully-featured raster toolkit. Both of these packages rely on GDAL for their IO, meaning they can use GDAL’s virtual file system functionality to efficiently access cloud storage.
The objects returned by these packages integrate well with the ggplot2 ecosystem for cartography and data visualization, with a host of helper packages including tidyterra and ggspatial providing spatial-specific extensions. There are also a host of packages that focus on wrangling, analyzing, and modeling sf and terra objects, plus many more for accessing and downloading remote data sources.
Last but not least, the standard of documentation for R packages is incredibly high, with packages mandated to have standardized man pages explaining a function’s purpose, its arguments and return values, and providing helpful examples that get run as part of a package’s CI. These man pages are then often converted into beautiful bootstrap-based HTML documentation websites, courtesy of the pkgdown package. Many R users say that it’s the high caliber of documentation that got them to start using R, and that keeps them focused on the language.
All that said, there are still some challenges with R’s spatial ecosystem.
First and foremost, none of the packages I just mentioned have standalone low-level libraries available for other languages to wrap. That means that some of the real strength of these libraries is locked within the R ecosystem. For instance, terra provides functions for focal statistics and averaging overlapping pixels when merging raster tiles,5 both of which are difficult to do using lower-level libraries. Right now, taking advantage of these functions requires an R runtime. Just to be extremely clear, I am not saying that anyone in particular should do the difficult work of extracting these components into lower-level libraries that multiple languages can wrap; I am however saying that doing so would be useful, if someone had interest or funding to do so.
Secondly, R is missing some of the “sugar” that makes it easy to write cloud-native Python code. There isn’t a perfect R equivalent to fsspec, meaning R users need to handle remote filesystems differently than they might local ones. While there are several parallelization backends available for R, these backends don’t scale to computer clusters and or task scheduling as easily as Dask. There’s still space in R for libraries that either wrap or re-implement some of these core workflow tools.
Third off, R packages are not often designed with serverless workflows in mind, and in particular often (due to CRAN requirements!) attempt to write and read from temporary directories that may or may not exist. This can require clever workarounds, or patches to the packages themselves to address.
And last but not least, as mentioned earlier, R users would benefit from more tutorials for advanced cloud-native workflows. This would be a fantastic place for the community – and funders – to invest in the near future, in order to help more users take advantage of cloud resources and improve their geospatial workflows going forward.
The GeoParquet community is pleased to announce the release of GeoParquet 1.0.0. This is a huge milestone, indicating that the format has been implemented and tested by lots of different users and systems and the core team is confident that it is a stable foundation that won’t change. There are more than 20 different libraries and tools that support the format, and hundreds of gigabytes of public data is available in GeoParquet, from a number of different data providers.
I gave a good bit of the backstory in the beta.1 announcement, but the main driving push has been to settle on one standard way to encode geometries in the Apache Parquet format. The immediate goal has been to enable spatial interoperability among the set of modern data science tools (BigQuery, Snowflake, Athena, DuckDB, etc) that leverage Parquet to great effect and increasingly have geospatial support. Though most of those do not yet support GeoParquet it is likely on many of their roadmaps, and providing the stable base of 1.0.0 should make it even easier for them to adopt.
But in the meantime GeoParquet has emerged as just a great geospatial format, with support in many geospatial libraries and tools, that I think has potential to be a core Cloud-Native Geospatial distribution format and a go to for any day to day geospatial work.
The core reason it’s becoming everyone’s favorite new format is that it’s simply faster and smaller than the competition. I wrote a blog on some testing I did exploring write performance and file size, and intend to make some testing tools for read performance as well. For those that don’t want to read the full post, a typical file size comparison is:
The main reason for this is that Parquet is compressed by default. The other formats can be zipped up, but then they aren’t actually usable until you unzipped them. GeoParquet’s speed is also quite impressive compared to other formats, mostly due to the fact that it’s a columnar format instead of a row-oriented one, and has an large ecosystem of tools that have really optimized its performance.
I think the most impressive thing about GeoParquet is how robust the ecosystem has become, before we even got to 1.0.0. I fully believe this is just the start, and that in no time at all it’ll be weird for a geospatial tool to not support GeoParquet, and many non-geo tools will have it as their only native geospatial option. I’ll do a full post on the amazing ecosystem soon, but you can get a quick sense from the list of tools and libraries on geoparquet.org:
We’re also starting to see data providers like Microsoft, Maxar, Planet, Ordnance Survey and others put new data in GeoParquet. And the community is also converting a number of interesting large scale datasets like the Google Open Buildings and Overture Maps data to GeoParquet on Source Cooperative.
The release of 1.0.0 is truly just the beginning. We’re taking it through the full Open Geospatial Consortium’s standardization process, as we’ve started forming an official GeoParquet Standards Working Group. We hope to move through the standardization process relatively quickly, to become an full official OGC Standard.
There is also a lot of activity on the GeoArrow specification, which will form the basis of a columnar geometry format for GeoParquet 1.1.0. That has a lot of potential to make the format and tools around it even more performant.
We’re excited to see this community grow, with more data, more tools and more innovation. Please help us by converting data into GeoParquet, demanding GeoParquet from your data providers, and building tools to make use of it. And let us know when you do, so everyone can keep track of the growth of this exciting community.
Anyone who has been following me closely the last couple of months has picked up that I’m pretty excited by DuckDB. In this post, I’ll delve deep into my experience with it, exploring what makes it awesome and its transformative potential, especially for the geospatial world. Hopefully, by the end of the post, you’ll be convinced to try it out yourself.
So I think I first heard about DuckDB maybe six months ago, mentioned by people who are more aware of the bleeding edge than I, like Kyle Barron. My thought on hearing about it was probably similar to the majority of people reading this – why the heck would I need a new database? How could this new random thing possibly be better than the vast array of tools I already have access to? I’m not the type who’s constantly jumping to new technologies and generally didn’t think that anything about a database could really impress me. But DuckDB somehow has become one of the pieces of technology – I gush about it to anyone who could possibly benefit. Despite my attempts, I struggled to convince people to actually use it. That was until my long-time collaborator, Tim, gave it a try. His experience mirrored my sentiments:
So now I’m newly inspired to convince everyone to give it a try, including you, dear reader.
My story is that I came to DuckDB after spending a lot of time with GeoPandas and PostGIS, in my initial attempt to create this cloud-native distribution of Google’s Open Buildings dataset on the awesome Source Cooperative. The key thing I wanted to do was partition the dataset and start a discussion to learn about it. Max Gabrielsson, the author of the DuckDB spatial extension had been following GeoParquet, and suggested that a popular use case for it is partitioning of Parquet files – exactly the problem I was grappling with. It didn’t handle what I was looking to do, but it was super easy to install and start playing with. The nicest thing for me was that I could just treat my Parquet files directly as ’the database’ – I didn’t have to load them up in a big import step and then export them out. You can just do:
select * from '0c5_buildings.parquet'
Even cooler is to do a whole directory at once:
select * from 'buildings/*.parquet'
I had also been really struggling with this 100 gigabyte dataset – trying to do it all in Pandas led to lots of out of memory errors. I had loaded it up in PostGIS, which did let me work with it and ultimately accomplished my goal, but many of the steps were quite slow. Indeed a count(*)
took minutes to respond, and just loading and writing out all the data also took half a day. Even ogr2ogr
struggled with out of memory errors for some of my attempted operations.
When I found DuckDB I had mostly completed my project, but it immediately impressed me. I was able to do some of the same operations as other tools but instead of running out of memory, it’d just seamlessly start making use of disk instead of hogging up all the memory. The other thing that blew me away was being able to do the count of 800 million rows. I could load all the Parquet files up in a single 100 gigabyte database and get the count in near instant times, which hugely contrasted with PostGIS. I’d guess there’s a way to get an approximate count much faster with PostGIS but,
I could even just skip the step of loading the data into DuckDB and just get the count of all the Parquet files in my directory, also in sub-second times. The other thing that’s pretty clear with DuckDB is that it’s always firing ‘on all cylinders’ – it consistently uses all the cores of my laptop, and so just generally crunches things a lot faster. After this experience, I started to reach for DuckDB more to see how it’d do in more of my data-related tasks, and it’s continued to be awesome.
In this section, I’d like to explain all the reasons I like DuckDB so much. My overall sense is that there really isn’t any single ‘killer feature’, it’s just a preponderance of small and medium-sized things, that all add to just a really great user experience. But I’ll highlight the ones that I noticed.
There was no friction for me to get going. There were lots of options to install DuckDB, and it all ‘just worked’ – I had it running on my command-line in no time.
You just type 'duckdb'
and you can start writing SQL and be instantly working with Parquet. When you’re ready you can turn it into a table. If you want to save the table to disk then you just supply a filename like duckdb mydata.duckdb
and it’s done. Then later when I started to use it with Python it was just a pip install duckdb
to install it. In Python, I could just treat it the same as any other database, but with much easier connection parameters – I didn’t need to remember my port, database name, user name, etc. Installing the extensions was also a breeze, just:
INSTALL spatial;
LOAD spatial;
This installs all the format support of GDAL/OGR, but I suspect it is much less likely to foobar your GDAL installation by managing and using it more locally to DuckDB.
There are lots of nice little touches that make it much more usable. The following highlights two of my favorites:
The first is the progress bar, showing the percentage done of the command. This is in no way essential, but it’s really nice when you’re running a longer query to get some sense of if it’ll finish in a few seconds or if it’s going to be minutes or even longer. I won’t say that it’s always perfect, indeed when you’re writing out geospatial formats it’ll get to 99% and then pause there awhile as it lets OGR do its thing. But I really appreciate that it tries.
The second is what gets returned when you do 'select * from'
– it aims to nicely fill your terminal, showing as many columns as it can fit. If your terminal isn’t as wide it’ll show less columns, and always show the total number of columns, and the number shown. My experience with other databases is often more like this:
(I’m mostly hitting spacebar a bunch to try to scroll down, but never get anywhere so hit ‘q’ to quit)
I’m not sure how DuckDB determines which to show, but they’re often the ones I want to see, and it’s easy to just make a new SQL statement to show exactly the ones you want to see. It also shows you the first 20 rows and the last 20 rows. If you do a ORDER BY
then you can easily see the range:
I also snuck in a third:
CREATE TABLE CA AS (select * EXCLUDE geometry, ST_GeomFromWKB(geometry) AS geometry from 'CA.parquet');
This EXCLUDE geometry
is one of those commands you’ve always wanted in SQL but never had a way to do it. In most SQL, if you want to leave off one column then you need to name all the other ones, even if it’s like 20 columns with obscure names. It’s cool to see nice innovations in core SQL. And the innovations from DuckDB are spreading, like GROUP BY ALL
, which I haven’t really used yet. It was recently announced in Snowflake, but it originated in DuckDB. A recent blog post called Even Friendlier SQL has lots more tips and tricks, including one I’m starting to use: you can just say FROM my_table;
instead of select * from my_table
!
There are lots of other little touches that can be hard to remember but all add up to just a pleasant experience. Things like autocomplete and using ctrl-c to quit the operation, and again to quit the DuckDB command interface – the timing is just well done, where I never accidentally quit when I don’t want to. Another thing I haven’t yet explored that I’m excited about is doing unix piping in and out of DuckDB in the command-line. I’ve not yet explored this, but it seems incredibly powerful.
Another thing I really like is how easy it is to switch between interacting with the database on the command-line and in Python. Since the entire database is just a location on disk I can easily start up the database and inspect it to confirm what I did programmatically. It’s also super easy to move your database, you just move it like any other file, and connect in the new location. The other thing I really appreciate is you can easily see how much space your database is taking up – since it’s just a file you can view it and delete it just like others. It’s not buried in ‘system’ like my tables in PostGIS, and it also doesn’t need to be running as a service to connect to it. (I realize that SQLite offers up this type of flexibility, but it never got into my day to day workflows).
DuckDB also has the ability to work in a completely Cloud-Native Geospatial manner – you can treat remote files just like they’re on disk, and DuckDB will use range requests to optimize querying them. This is done with the awesome httpfs extension, which makes it super easy to connect to remote data. It can point to any https file. The thing I really love is how easy it is to connect to S3 (or interfaces supporting S3, and I think native Azure support is coming soon).
load httpfs; select * from 'https://data.source.coop/cholmes/overture/geoparquet-country-quad-2/BM.parquet';
I mostly work with open data, and for years it was a struggle for me to either figure out the equivalent https location for an S3 location, or to get my Amazon account all set up when I just wanted to pull down some data. With DuckDB you don’t have to configure anything, you can just point at S3 and start working (I concede that Amazon’s Open Data does now have the simple one liner for unauthenticated access, but I think that’s newer, and DuckDB still feels easier).
With S3 you can also do ‘glob’ matching, querying a whole directory or set of directories in one call to treat it as a single table call.
select count(*) from read_parquet('s3://us-west-2.opendata.source.coop/cholmes/overture/geoparquet-refined/*.parquet');
With really big datasets it can sometimes be a bit slow to do every call remotely, but it’s also super easy to just create a table from any remote query and then you have it locally where you can continue to query it.
DuckDB also really shines with Parquet. The coolest thing about it is that you can just treat any Parquet file or set of Parquet files as a table, without actually ‘importing’ it into the database. If it’s under 500 megabytes / a couple million rows then any query feels fairly instant (if it’s more then it’s just a few seconds to create a table). The syntax is (as usual) just so intuitive:
select * from 'JP.parquet';
select * from '/Users/cholmes/geodata/overture/*.parquet'
If your file doesn’t end with .parquet you can use the equivalent read_parquet command: select * from read_parquet(*);
And this also works if you’re working with remote files, though performance depends more on your bandwidth. You can also easily write out new Parquet files, with lots of great options like controlling the size of row groups and doing partitioned writes. And you can do this all in one call, without ever instantiating a table.
COPY (select * from 'JP.parquet' ORDER BY quadkey) TO 'JP-sorted.parquet' (FORMAT PARQUET, ROW_GROUP_SIZE 10000)
The one thing that I think can be considered more than a ’little thing’ is simply the overall performance of DuckDB. I wrote a post on how it performed versus Pandas and GDAL/OGR for writing out data. There are lots of other aspects of performance to explore, but in my anecdotal experience, it’s faster than just about any other tool I try. I believe the main thing behind this is that it’s built for the ground-up for multi-core systems, and I’m on an M2 mac that I believe usually has a lot of idle CPU power.
The other thing behind it is just the inherent advantages of a columnar data store for data analysis – which is most of what I’m doing. DuckDB wouldn’t be what you’d use on a scale-out production system that’s write heavy, but that’s not what I’m doing. For just processing data on my laptop it feels like an ideal tool. The columnar nature of it enables things like the ‘count’ to always be sub-second responses, even with hundreds of millions of records. It’s just nice to not have to wait.
The other bit that’s really nice is how it’s rarely memory constrained. Even when you’re using it in ‘in-memory’ mode (i.e. not saving a specific database file), it’ll start writing out temp files to disk when it hits memory limits. I have hit some cases where I do get out of memory errors – I think it’s mostly when I’m trying to process big (8 gigabyte+) files directly to and from Parquet. But doing similar operations by creating huge databases on disk (like 150 gig+) and then writing out seems to go fine. The core team seems to be continually pushing on this, so I think 0.9 may have improvements that make it even better for huge datasets.
DuckDB’s spatial support is quite new, and not yet super mature. But it’s made one of the best initial releases of spatial support, by leveraging a lot of great open source software. The center of that is GEOS, which is the same spatial engine that PostGIS uses. This has enabled DuckDB to implement all the core spatial operations, so you can do any spatial comparison, and use those for spatial joins. You can convert Well Known Text and Well Known Binary with ease, or even X and Y columns – the conversion functions make it quite easy to ‘spatialize’ any existing Parquet data and do geospatial processing with it.
The other open source library they leveraged to great effect was OGR/GDAL. You can easily import or export any format that OGR supports, completely within DuckDB SQL commands.
Drivers supported in a typical DuckDB spatial instance
Drivers supported in a typical DuckDB spatial instanceThis makes it super easy to get data into and out of DuckDB, and to also just use it as a processing engine instead of a full blown database where you need to store the data. Those deep in the spatial world will likely question why this is even necessary – why wouldn’t you just use GDAL itself to load and export data?
My opinion is that we in the geospatial world need to meet people more than halfway, enabling them to stay in their existing workflows and toolsets. Including GDAL/OGR makes it far easier for people who just dabble in geospatial data to easily export their data into any format a geo person might want, and also really easy to import any format of data that might be useful to them. It’s just:
load spatial;
CREATE TABLE my_table AS SELECT * FROM ST_Read('filename.geojson');
It’s also nice that it’s embedded and more limited than a full GDAL/OGR install, as I suspect it’ll make it less likely to get GDAL in a messed up state, which I’ve had happen a number of times.
There are definitely some areas where its immature status shows. The geometry columns don’t have any persistent projection information, though you can perform reprojection if you do know what projection your data is in. The big thing that is missing though is the ability to write out the projection information in the output format you’re writing to. This could likely be done with just an argument to the GDAL writer, so hopefully will come soon. GeoParquet is not yet supported as an output, but it is trivial to read, and quite easy to write a ‘compatible’ parquet, and then fix it with GPQ (or any other good GeoParquet tool). My hope is that the native Parquet writer will seamlessly implement GeoParquet, so that if you have a spatial column it will just write out the proper GeoParquet metadata in the standard Parquet output (which should then be the fastest geospatial output format).
The biggest thing missing compared to more mature spatial databases is spatial indexing. I’ve actually been surprised by how effectively it’s performed in my work without this. I think I don’t actually do a ton of spatial joins. And DuckDB’s overall speed and performance allow it to just use brute force to chunk through beefy spatial calculations pretty quickly even without the spatial index. I’ve also been just adding a quadkey and then doing ORDER BY quadkey
which has worked pretty well.
More mature spatial databases do have a larger array of useful spatial operations, and they continue to add more great features, so it will take DuckDB quite a long time to fully catch up. But the beauty of open source is that many of the core libraries are shared, so innovations in one can easily show up in another. And the DuckDB team has been quite innovative, so I suspect we may well see some cool ideas flowing from them to other spatial databases.
If you want to get started with DuckDB for geospatial there’s a growing number of resources. The post that got me started was this post from Mark Litwintschik. I also put together a tutorial for using DuckDB to access my cloud-native geo distribution of Google Open Buildings on Source Cooperative. And I’ve not yet turned it into a proper tutorial yet, but as I processed the Overture data I started recording the queries I built so that I could easily revisit them, so feel free to peruse those for ideas as well.
There are a lot of potential ways I think DuckDB could have an impact in our spatial world, and indeed has the potential to help the spatial world have a bigger impact in the broader data science, business intelligence, and data engineering worlds.
The core thing that got me interested in it is the potential for enabling Cloud-Native Geospatial workflows. The traditional geospatial architecture is to have a big database on a server with an API on top of it. It is dependent on an organization having the resources and skills to keep this server running 24/7, and to scale it so it doesn’t go down if it’s popular. The ’even more traditional’ architecture is to have a bunch of GIS files on an FTP or HTTP server that people download and load up in their desktop GIS. DuckDB with well partitioned GeoParquet opens up a third way, in the same way that Cloud-Optimized GeoTIFF’s (COG’s) allowed you to make active use of huge sets of raster data sitting on object stores. You can easily select the subset of data that you want directly through DuckDB:
select * from read_parquet('s3://us-west-2.opendata.source.coop/cholmes/overture/geoparquet-country-quad/*/*.parquet') WHERE quadkey LIKE '02331313%' AND ST_Within(ST_GeomFromWKB(geometry), ST_GeomFromText('POLYGON ((-90.791811 14.756807, -90.296839 14.756807, -90.296839 14.394004, -90.791811 14.394004, -90.791811 14.756807))')))
I’ll explore this whole thread in another post, but I think there’s the potential for a new mode of distributing global-scale geospatial data. And DuckDB will be one of the more compelling tools to access it. It’s not just that DuckDB is a great command-line and Python tool, but there’s also a brewing revolution with WASM, to run more and more powerful applications in the browser. DuckDB is easily run as WASM, so you can imagine a new class of analytic geospatial applications that are entirely in the browser.
There’s also a good chance it could be an upgraded successor to SQLite / Spatialite / GeoPackage. There’s a lot of geospatial tools that embed sqlite and use it for spatial processing. DuckDB can be used in a similar, but in most analytic use cases it will be much faster since it’s a columnar database instead of a row-oriented one. And in general, it incorporates a bevy of cutting edge database ideas, so performs substantially faster. Indeed you could even see a ‘GeoPackage 2.0’ that just swaps in DuckDB for SQLite.
It could also serve as a powerful engine embedded in tools like QGIS. You could see a nice set of spatial operations enabled by DuckDB inside of QGIS, using it as an engine to better take advantage of all of a computer’s cores. I could also see QGIS evolve to be the ideal front-end for Cloud Native Geospatial datasets, letting users instantly start the browser and analyze cloud data, seamlessly downloading it and caching it when necessary, with DuckDB as the engine to enable it.
While I think DuckDB can be a great tool in our spatial world, I think it has even more potential to help bring spatial into other worlds. The next generation of business intelligence tools (like Rill and Omni) is built on DuckDB, and it’s becoming the favorite tool of general data scientists. Having great spatial support in DuckDB and in Parquet with GeoParquet enables a really easy path for a data scientist to start working with spatial data. So we in the spatial world can ride DuckDB’s momentum and the next generation of data tools will naturally have great geospatial support. This trend has started with Snowflake and BigQuery, but it’s even better to have an awesome open source data tool like DuckDB.
I believe that the spatial world is still a niche – the vast majority of data analysis done by organizations does not include geospatial analysis, though in many cases it could bring some additional insight. I think it’s because of the long standing belief that ‘spatial is special’, that you need a set of special tools, and the data must be managed in its own way. If we can make spatial a simple ‘join’ that any data science can tap into, with a nice, incremental learning curve where we show value each step of the way, then I believe spatial can have a much bigger impact than it does today. I think it’s great that DuckDB uses GDAL/OGR to be compatible with the spatial world, but I really love the fact that it doesn’t make people ’learn GDAL’ – they can continue using SQL, which they’ve invested years in. We need to meet people more than halfway, and I think GDAL/OGR and GEOS within DuckDB do that really well.
There are a ton of exciting things coming in DuckDB, and lots of work to make the spatial extension totally awesome. I’m really excited about DuckDB supporting Iceberg in 0.9, and I’m sure there will be even more performance improvements in that release. Top of the list for spatial advances is native GeoParquet support, and better handling of coordinate references and spatial indices, and I know the spatial extension author has a lot of ideas for even more. Unfortunately, much of the work is paused, in favor of funded contracts. I’m hoping to try to do some fundraising to help push DuckDB’s spatial support forward – if you or your organization is potentially interested then please get in touch.
One thing I do want to make clear is that I still absolutely love PostGIS, and am not trying to convince anyone to just drop it in favor of DuckDB. When you need a transaction-oriented OLTP database nothing will come close to PostgreSQL, and PostGIS’s 20 year head start on spatial support means its depth of functionality will be incredibly hard to match (for more on the history of PostGIS check out this recent podcast on Why People Care About Postgis And Postgres). PostGIS introduced the power of using SQL for spatial analysis, showing that you could do in seconds or minutes what expensive desktop GIS software would take hours or days to do. I think DuckDB will help accelerate that trend, by making it even faster and easier when you’re working locally. I’m also excited by DuckDB and PostGIS potentially spurring one another to be better. I worked on GeoServer when MapServer was clearly the dominant geospatial web server, and I’m quite certain that the two projects pushed one another to be better.
And finally – call to action. Give DuckDB a try! If you’ve made it all the way to the end of this post then it’s definitely worth just downloading it and playing with it. It can be hard to find that first project where you think it’ll make sense, but trust me – after you get over the hump you’ll start seeing more places. It really shines when you want to read in remote data on S3, like the data I’ve been putting up on Source Cooperative, and is great for lots of data transformation tasks. And any task that you’re using Pandas for today will likely perform better with DuckDB.
All Grass GIS users worldwide will benefit funding to this group of people.
Meet Scott Parks, a trailblazer in the world of map development who is transforming hiking experiences through geospatial data. As the founder of Postholer, a resource for hikers that features interactive trail maps, Scott leverages open-source geospatial tools to provide hikers with smart mapping solutions.
Our team first became aware of Scott’s work when he built upon Kyle Barron’s demonstrations of how to create responsive browser-based tools to work with large volumes of geospatial data. Scott believes that cloud-native data has enormous potential in the way we present data, particularly spatial, on the web.
In this interview, we learn about Scott’s journey into the cloud-native geospatial world and gain insights into how new approaches to sharing data on the Internet have allowed him to make data more accessible to hikers.
While planning my first hike of the Pacific Crest Trail in 2002, I realized the lack of standard answers to frequently asked questions, such as, “How much near trail snow is currently in the sierra?” or “How cold does it get at location X in June?” Having a passion for the outdoors (public lands) and technology, I attempted to answer some of these questions.
In 2003, I introduced the embarrassingly general sierra snow graphic using data from CDEC and SnoTEL. Ironically, 2003 was the first year the Snow Data Assimilation System (SNODAS) was modeled, which eventually became my go-to source for snow data. That beginning led to the addition of climate, weather, wildfires, fauna and other data over the next 20 years (and counting).
Comparison of the Pacific Crest Trail Snow Conditions Maps: 2006 (left) vs. 2023 (right)
Catering to the specific needs of the hiking community is the easy part: I just have to listen. The hiking community asks the questions that I may or may not have asked myself. The difficult (and fun) part is creating answers for the community expressed in numerous iterations of web/print maps, data books, tables, and charts.
In 2019(?), I read an article by Planet or someone at Planet on the subject of Cloud Optimized GeoTIFF (COG). The idea was certainly interesting and it found a place in a corner of my mind, but no action on my part. In 2021, I read a COG use case by Sean Rennie & Alasdair Hitchins that mentioned the use of a JavaScript API called georaster-layer-for-leaflet created by Daniel DuFour. This critical API is what I use today for COG in my Leaflet maps.
Some time later, I stumbled across an example showing 12GB of census block data being displayed on a Leaflet map. That blew my mind, which led to my investigation of the FlatGeobuf (FGB) vector format. This indexed vector data format uses the same underlying HTTP protocol streaming mechanism used by COG. Björn Harrtell is the big brain behind Flatgeobuf.
There are other cloud-native vector formats that many find useful, such as pmtiles. For the sake of simplicity, I settled solely on FGB. Should I need more from my cloud-native vector data, I may revisit that decision.
For me, the most compelling reason to utilize cloud-native tools/data is the ability of the client/web app to retrieve data directly from cloud storage (such as S3) using no intermediate server or services. This means no tile servers, databases, caches or resources to support these services. Imagine a fully functional Leaflet map with numerous vector/raster layers using only a web browser and cheap cloud storage!
Read the fine manual, right? Both Flatgeobuf and georaster-layer-for-leaflet API’s have fully functional examples and documentation. GDAL has notoriously good documentation for all of its supported formats and utilities. With that, it’s a matter of reading the documentation, studying those examples and through trial and error, adapting it to your own needs.
In the larger context of cloud-native and geo-spatial, when people like Even Rouault, Daniel DuFour, Paul Ramsey, Chris Holmes, Matthias Mohr, Howard Butler, et al, post on social media or mailing lists, I lean forward to listen. The geo-spatial community is fortunate to have so many smart/creative individuals. Oh, and what would any of us be without Google?
When met with a new spatial format or concept, my first stop is GDAL. Does GDAL recognize this format? Do the utilities accommodate this concept? If the answer is no, that immediately tells me the format/concept has not reached critical mass and maybe it never will. Being the first to introduce the newest tech or ’the next big thing’ into production is a big no-no. I want proven and tested technologies that will be around tomorrow.
As mentioned, with cloud-native data you don’t need any back-end servers or services. You don’t need resources to maintain those services. Once I went all in on cloud-native, I removed from production a WMS/WFS/WMTS server using PostgreSQL/PostGIS, MapServer, MapCache running on a r5.large EC2 server with 400GB of disk storage. Removing the need to run that server was huge. While I maintain the same kind of environment for development, it was no longer needed in production.
In 2007, I created my first web maps using Google Maps API and continued to do so until I fully switched to cloud-native in June of 2023. Relying solely on open source is a big plus.
Much of my cloud-native COG/FGB data gets updated hourly, daily, etc, and you still need back-end resources to do so. However, it’s greatly simplified. All remote data sources are retrieved and processed using GDAL utilities wrapped in BaSH scripts and scheduled using cron. Currently, my web maps support 87 raster/vector layers.
The biggest challenge for me was the mental leap to actually commit to cloud only and rewriting everything. I literally had 15 years of Google Maps API and back-end OGC services running my web maps. It took 6 months of developing/testing/rewriting until I was comfortable making the switch. For those starting with a blank sheet, cloud-native is an easy choice.
Static vs dynamic content will be a primary consideration for how fully you commit to cloud. With purely cloud-native data, this can push any dynamic spatial analysis onto the web app/client. In a perfect world I think this is ideal, but not always practical.
For example, on femaFHZ.com, I display various peril/hazard layers. The user has the ability to double-click on the map and retrieve 14 different perils for that lat/lon. Using purely cloud-native data sources, the web app would make 14 different requests from 14 different COGs, then process/display the results. That is not ideal. Using server side, I have a virtual raster (VRT) that contains these 14 different COG’s. The user lat/lon is sent via a simple web service, the virtual raster is queried and all 14 results are sent back. It’s a single request/response and extremely fast.
Thoughtful data creation. Perhaps you’re displaying building footprints for the entire state of Utah. Should every web client calculate the area of every polygon each time it accesses the data? Maybe it’s wise to add an ‘area’ attribute in each FGB feature to avoid so much repetitive client-side math. You could calculate the area of the polygon only when the user interacts with it. Your approach may vary when dealing with hundreds vs millions of features.
Should my layer be COG or FGB? It’s not necessarily one or the other – it can be both. Using the building footprint example, is it practical to display vector footprints at zoom level 12 on your Leaflet map? You can’t make out any detail at that zoom level. Maybe have 2 versions of the data, a low resolution raster for displaying at zoom level 1-13 and vector at zoom 14-20. This is the technique I use at femaFHZ.com to display FEMA flood hazard data and the building footprint example.
Caution with FGB. If the extent of your 10GB vector data is contained within your map viewport extent at, say, zoom level 4, the FGB API will gladly download it all and try to display it. Using a similar approach as above, with wildfire perimeters, I create a low resolution vector data set for use at low zoom levels and use the complete data set at high zoom levels.
It’s OK if you don’t use cloud-native data exclusively. Most of my base maps (OSM, Satellite, Topo, etc) are hosted and maintained by someone else (thank you, open source community!). These are likely large, high availability tile cache servers behind content delivery networks. Don’t reinvent the wheel unless you can build a better mousetrap!
Keep it simple. Use core open source technology such as (GDAL, SQL, QGIS, PostgreSQL/PostGIS, SQLite/Spatialite, STAC, Leaflet,OpenLayers). Have a good understanding of how to use your tools! That cool tech you’re emotionally attached to may not be supported tomorrow and it might not be the best tool for the job.
Beware of proprietary geo-spatial companies offering cloud-native solutions who lost sight of their mission many, many financial statements ago.
The topic of cloud-native data is incomplete without discussing STAC. When working with many high resolution COGs, it’s difficult to juggle so many rasters. STAC will allow you to easily identify COGs (meta data) from a collection of COGs (catalog) defined by a bounding box. STAC has become widely accepted by private and government organizations alike. It’s imperative we continue to promote STAC as it’s the perfect marriage with COG. Radiant Earth’s Github page is an excellent place to start for learning more about STAC.
FOSS GIS goes to great lengths to support open source endeavors. Fortunately, many private and government entities, large and small, step up financially to support these endeavors. Many (if not all) proprietary geo-spatial companies use open source software in their offerings. Because you operate within the bounds of the license doesn’t absolve you from the responsibility of supporting open source. It’s in everyone’s best interest to support open source!
Thank you Radiant Earth for allowing me to speak on such an important topic, that’s necessary for the future of geospatial data on the web.
Going to be presenting at FOSS4G on Thursday in Swansea at the University of Wales Trinity St David and realised this institute was originally called 'West Glamorgan Institute of Higher Education (Wiggy) as we called it then back in the 80s where I was a student in Graphic Design.
On reflection... I wondered what happened to some of those lecturer's and in particular Bart O'Farrell. In homage to this great welsh artist and character I'm posting the only interview I can find of his career while it remains online.
https://cornishbirdblog.com/bart-ofarrell-wizard-on-the-lizard-full-unedited-interview/
The absence of a suitable online platform can seem daunting when dealing with map data. Nevertheless, armed with the right technology – in this case, Geographic Information System (GIS) software – analyzing maps becomes a breeze, devoid of unnecessary complexity. What is GIS? GIS, or geographic information system, is a tool that’s a game-changer in […]
The post Top 6 Benefits of GIS Utility Mapping Service first appeared on Grind GIS-GIS and Remote Sensing Blogs, Articles, Tutorials.As I mentioned in my previous post, I recently put out my first open source Python project. I wouldn’t say I’m super proud of it, at least not yet, as it’s far from what I imagine it could be. It’s also kind of a weird mix of a very specific tool to better format the Google Open Buildings dataset, mixed in with benchmarking experiments for different ways to format that data. But it felt good to actually release it, and was awesome to immediately get help on the packaging. I wanted to use this post to share some of the interesting results working with different tools and formats.
Note: The following comparisons should not be considered true benchmarking. It’s just a reporting of experiments I did on my laptop, and it only tests writing to the formats, not reading from them. But anyone is welcome to use the same commands as I did, as I released the code I was using. Some day I hope to make a more dedicated performance comparison tool.
The core thing the library does is translate the Google Open Building CSV’s into better formats. For this step I just use the same S2 structure that Google had, so that they’re in faster & better formats to do more with. FlatGeobuf, which is ideal for making PMTiles with Tippecanoe due to its spatial index, and GeoParquet, which is compact and fast, are the primary formats I explored.
I initially used GeoPandas for the Google Open Buildings dataset v2, having heard great things about it and after ChatGPT suggested it. Usually I’d just reach for ogr2ogr, but I wanted to split all the multipolygons into single polygons and didn’t know how to do that (I also wanted to drop latitude and longitude columns and it was cool how easy that is with GeoPandas). During the processing of the data I got curious about the performance differences between various file formats. I started with 0e9_buildings.csv, a 498 megabyte csv file. It took 1 minute and 46.8 seconds with FlatGeobuf and 11.3 seconds with GeoParquet. So I decided to test with Shapefile and Geopackage output, in case FlatGeobuf is just slow.
Comparing Pandas processing time for FlatGeobuf, Geopackage, Parquet, and Shapefile
Nope, FlatGeobuf is faster than either, with Geopackage taking 1 minute and 40 seconds and Shapefile 1 minute and 42.3 seconds.
The other interesting thing was the size of the resulting files:
GeoPackage, Shapefile, and FlatGeobuf might not be much smaller than the CSV in terms of file size, but their spatial indices enhance performance in most use cases. You can also drag and drop into most any GIS system, while the CSV generally needs some intervention.
The chart above highlights another big benefit of GeoParquet – it’s substantially smaller than the other options. The reason for the difference is that Parquet is compressed by default. If you zip up the other formats they’re all about the same size as Parquet. But you typically have to unzip the files to actually work with them. Of course Even Rouault has changed the calculus with SOZip, which enables you to zip up traditional formats but still read them with GDAL/OGR. But with Parquet it’s really nice that you don’t have to think about it, the compression ‘just happens’, and you get much smaller files by default.
Note I didn’t include GeoJSON since I don’t think it’s appropriate for this use case. I love GeoJSON, and it’s one of my favorite formats, but its strength is web interchange – not large files storage and format conversion. But for those interested the GeoJSON version of the data is 807 megabytes.
Even cooler, with Parquet you can select different types of compression. The default compression for most of the tools is ‘snappy’, which I’m pretty sure means that it optimizes speed of compression / decompression over absolute size. So I was curious to explore some of the trade-offs:
Snappy | Brotli | GZIP | |
---|---|---|---|
Time (s) | 11.03 | 17.8 | 15.3 |
Size (mb) | 151.9 | 103.8 | 114.2 |
The original CSV file was obviously smaller for this test, but it’s pretty cool that you can make the size 30% smaller (though it takes a bit longer). The one big caveat here is that not all the parquet tools in the geospatial ecosystem support all the compression options. DuckDB doesn’t support Brotli, for input or output, while GDAL/OGR does support it, but it depends on what compression the linked Arrow library was compiled with, so I wouldn’t actually recommend using Brotli (yet). But it’s great to see that mainstream innovation on compression gets incorporated directly into the format, and that all operations are done on the compressed data.
For processing v3 of the Google Buildings data I wanted to also try out using DuckDB directly as a processing engine, since with v2 the big files really pushed the memory of my machine and I suspected DuckDB might perform better. So I took the same files and built up the same processing steps (load CSV, remove latitude and longitude columns, split the multi-polygons), but in pure DuckDB. And the results were quite interesting:
Whoa! GeoPackage in DuckDB is super slow, but everything else is substantially faster (I suspect there’s a bug of some sort, but it is a consistent bug with any gpkg output of DuckDB).
Removing GeoPackage from the mix clarifies the comparison:
So DuckDB’s speed compared to Pandas for FlatGeobuf and Shapefile is quite impressive, and it is a bit faster for Parquet. One note is that these comparisons aren’t quite fair, as none of the DuckDB outputs are actually ‘correct’. Parquet output is not valid GeoParquet, since it doesn’t have the proper metadata, but it is a ‘compatible’ parquet file (WKB in lat/long w/ column named ‘geometry’), so the awesome GPQ is able to convert it. For this particular file the conversion adds 3.64 seconds, so even with the conversion it’s still faster than Pandas. And DuckDB should support native GeoParquet output at some point, and writing the proper metadata shouldn’t add any performance overhead, so the headline times will be real in the future.
FlatGeobuf and Shapefile output from DuckDB also need adjustment: the output does not actually have projection information in it. This is another one that I assume will be fixed before too long. In the meantime you can do an ogr2ogr
post-process step, or add it on the fly when you open it in QGIS (it tends to ‘guess’ right and display, just has a little question mark by the layer and you can choose the right projection).
I also got curious about how ogr2ogr
compares, since DuckDB uses it under the hood for all its format support (except for Parquet). I just called it as a subprocess – a future iteration could compare calling the Python bindings or using Fiona. I did not take the time to figure out how I could split multipolygons with a pure ogr2ogr process (tips welcome!), and was mostly just interested in raw comparison times.
So DuckDB and OGR end up quite comparable, since DuckDB uses OGR under the hood. DuckDB is a little bit slower for GeoPackage and FlatGeobuf, which makes good sense as it’s doing a bit ‘more’. DuckDB’s pure Parquet output is faster since it isn’t using OGR (6.88 seconds with OGR, 5.97 with DuckDB), but as pointed out above right now it needs a GPQ post-processing step, which takes it to 9.61 seconds for fully valid GeoParquet output. Performance of different file sizes The other question I investigated a bit is whether the size of the file affects the relative performance much. So I tried with a 5.03 gb file (103_buildings), a 101 mb file (219_buildings), a 20.7 mb file (0c3_buildings) and a 1.1 mb file (3b6_buildings).
These were all done with ogr2ogr, since it gave the most consistent results. The relative performance was all quite comparable, except in the smallest file where FlatGeobuf edges out GeoParquet, and Shapefile takes a performance hit – but these times are all sub-second, so the difference isn’t going to make any real difference.
I also tried to compare all four on the largest CSV file in there, clocking in at 21.78 gb.
test
You may notice that Shapefile isn’t an option – that’s because it completely failed, highlighting why there are sites encouraging you to switch from Shapefile. First I got warnings that happened every time I did shapefile output – the column names get truncated since Shapefile can’t handle more than 10 characters in its field names. Then with larger files I’d get a number of warnings about the 2GB file size. But with ones over 10 gigabytes or so it would completely fail, reaching a maximum file size of ~4 gigabytes. So Shapefile is just not a good format for working with large amounts of data (I tend to avoid Shapefile as GeoPackage is better in most every way and now is almost as widely supported).
The performance of the other three was interesting. Parquet pulls ahead even more, processing the 21 gb in under 5 minutes and reducing its size to just 8.62 gb, while FlatGeobuf took just under an hour, and GeoPackage took over two and a half hours.
As I mentioned above you are more than welcome to try out your own version of this testing. All the code I used is up at github.com/opengeos/open-buildings. You can just do pip install open-buildings
and then from the command-line you can run open_buildings benchmark --help
to see the options to put in. All the input needs to be Google Open Building CSV files downloaded from sites.research.google/open-buildings/#download.
I hope to spend some time refactoring the core benchmarking code to run against any input file, and also measure reading times, not just writing durations.
Like I mentioned in the warning above, this is far from rigorous benchmarking, it’s more a series of experiments, so I hesitate to draw any absolute conclusions about the speed of particular formats or processes. There may well be lots of situations where they perform more slowly. But for my particular use case, on my Mac laptop, GeoParquet is quite consistently faster and produces smaller output. And DuckDB has proven to be very promising in its speed and flexibility as a processing engine.
Once DuckDB manages to fix the highlighted set of issues it’s going to be a really compelling alternative in the geospatial world. My plan over the next few months is to use it for more processing of open buildings data – completing the cloud-native distribution of v3 of Google Open Buildings, and also using it to do a similar distribution of Overture (and ideally convincing them to adopt GeoParquet + partitioning by admin boundaries). I’ll aim to release all the code as open source, and if I have the time I’ll compare DuckDB to alternate approaches. I’m also working on a command-line tool to easily query and download these large buildings datasets, transforming them into any geospatial format, all using DuckDB. The initial results are promising after some tweaking, so hopefully I’ll publish that code and blog post soon too.
The Cloud-Native Geospatial Foundation, in partnership with the SpatioTemporal Asset Catalog Project Steering Committee (STAC PSC), is excited to announce the first in-person STAC sprint since 2019. The 8th STAC sprint will take place at Element 84’s offices in Philadelphia, PA from September 26th to 28th. The sprint has two goals:
Building on the previous successes of STAC Sprints, the event will bring together a community of interested collaborators to push the entire STAC ecosystem forward.
On the technical side, the goal is to get the STAC Specification ready for 1.1.0, which entails addressing all the issues in the 1.1.0 Milestone. The STAC PSC has been holding monthly STAC Working Sessions to work through all the open Issues in the stac-spec GitHub repository. The team categorizes the issues into specific date milestones and thematic tags to better organize and prioritize tasks. By the time of the STAC Sprint, the team will have discussed all of the open issues in the repository and will have a set list of issues to address for the Sprint (Milestone 1.1 and “new extension” tagged issues). Additionally, multiple updates with stac-spec
and stac-api-spec
extensions will be addressed during this Sprint.
In order to help more people benefit from STAC, we need to develop educational materials to introduce people to STAC and how to use it. We are currently collecting information on community needs to identify which educational materials we should develop at the sprint. Additionally, the STAC PSC has prioritized the translation of STAC materials into additional languages other than English. Depending on the language skills of sprint attendees, translating documents may also be a sprint activity.
Throughout the 3-day event, the STAC sprint will feature a similar structure each day:
If you are interested in participating, we invite you to fill out the application form here. We are also exploring an option for virtual participation. If interested in virtual attendance, please make sure to indicate it on the form.
If you are interested in attending in person but don’t have travel funding we may have some small travel grants available.
We are seeking sponsors who want to help make the sprint a success. To learn more about sponsorship opportunities, please see our sponsorship prospectus. We are grateful to Element 84 hosting the sprint.
$ sudo apt install libboost-all-dev libqt5charts5-dev libxerces-c-dev libncurses-dev cmake-curses-gui libqt5opengl5-dev pybind11-dev
$ git clone --recursive https://github.com/pcraster/pcraster.git $ cd pcraster && mkdir build && cd build $ cmake -G"Unix Makefiles" -D CMAKE_BUILD_TYPE=Release -DPCRASTER_BUILD_TEST=OFF .. $ make -j4 $ sudo make install $ echo "export PYTHONPATH=$PYTHONPATH:/usr/local/python" >> ~/.bash_profileDon't believe everything a map appears to show. The Colour, the size and position of labels and the scale hierarchy of information can all be used to deceive you.
https://www.outsideonline.com/outdoor-adventure/exploration-survival/how-maps-lie/
Monroe and Daddy
Kona
30 minute wait… 📷
Finished reading: Hands-On Azure Digital Twins: A practical guide to building distributed IoT solutions by Alexander Meijers 📚
I’ve been working with Digital Twins for almost 10 years and as simple as the concept is, ontologies get in the way of implementations. The big deal is how can you implement, how does one actually create a digital twin and deploy it to your users.
Let’s be honest though, the Azure Digital Twin service/product is complex and requires a ton of work. It isn’t an upload CAD drawing and connect some data sources. In this case Meijers does a great job of walking through how to get started. But it isn’t for beginners, you’ll need to have previous experience with Azure Cloud services, Microsoft Visual Studio and the ability to debug code. But if you’ve got even a general understand of this, the walk throughs are detailed enough to learn the idiosyncrasies of the Azure Digital Twin process.
The book does take you through the process of understanding what an Azure Digital Twin model is, how to upload them, developing relationships between models and how to query them. After you have an understanding on this, Meijers dives into connecting the model to services, updating the Azure Digital Twin models and then connecting to Microsoft Azure Maps to view the model on maps. Finally he showcases how these Digital Twins can become smart buildings which is the hopeful outcome of doing all the work.
The book has a lot of code examples and ability to download it all from a Github repository. Knowledge of JSON and JavaScript, Python and .NET or Java is probably required. BUT, even if you don’t know how to code, this book is a good introduction to Azure Digital Twins. While there are pages of code examples, Meijers does a good job of explaining the how and why you would use Azure Digital Twins. If you’re interesting in how you can use a hosted Digital Twin service that is managed by a cloud service, this is a great resource.
I felt like I knew Azure Digital Twins before reading this book, but it taught me a lot about how and why Microsoft did what they did with the service. Many aspects that caused me to scratch my head became clearer to me and I felt like this book gave me additional background that I didn’t have before. This book requires an understanding of programming but after finishing it I felt like Meijers' ability to describe the process outside of code makes the book well worth it to anyone who wants to understand the concept and architecture of Azure Digital Twins.
Thoroughly enjoyed the book.
Splash pad Saturday
Water table time! 📷
Me: I’d like a little basil for a small dish tonight.
Instacart: Here is a tree of basil.
📷
Why glTF is the JPEG for the metaverse and digital twins
JPEG took advantage of various compression tricks to dramatically shrink images compared to other formats like GIF. The latest version of glTF similarly takes advantage of techniques for compressing both geometry of 3D objects and their textures.
I actually like this analogy. Consider what I said 2 years ago:
It is widely supported these days but it really isn’t a great format for BIM models or other digital twin layers because it is more of a visual format than a data storage model. Think of it similar to a JPEG, great for final products, but you don’t want to work with it for your production data.
IFC is a much better format for Digital Twins but glTF does a great job with interoperability and storage. Much like you might want to store GIS data in a Geodatabase and share a map with JPEG, you should store your data in IFC and share as glTF.
🏢
Spring! Emmy picking flowers. She’s a child of the earth. 📷
My son’s graduation present (delayed a year because of COVID) was a trip to Italy. I’m stuck at home with two sick kids with earaches, watching him through Find My app. I’m so proud of him and excited for him to see Italy.
Plus he got to buy his first legal beer at the Heathrow Airport.
🇮🇹
Funny how when you start talking about something, it triggers some other thing. Well in a discussion about SpaceX with friends, I began to think about TerraServer. I couldn’t remember what happened to it, but of course I blogged about it over 10 years ago.
In its time, I used it to test WMS, but I’m not sure I ever really used it for anything else. Much like LandSat, I’ll shed no tears. It served its purpose and now it must die.
OUCH! Boy, I hope when I go I get a better eulogy. I don’t think I was being completely fair about TerraServer. I mean Jim Gray, scaling SQL Server on Windows NT, how could one not geek out. I used to be very hard to get access to open data and TerraServer did bring a ton of imagery data to users.
I mean unpack this:
The TerraServer demonstrates the scalability of Microsoft’s Windows NT Server and SQL Server running on Compaq AlphaServer 8400 and StorageWorks™ hardware.
That’s the world it grew up in. Cheers TerraServer! 🗺
I was just thinking about the iPod today and how it used to be such a critical part of my life. Now not only do I not use an iPod but I don’t even buy my music anymore, just use Apple Music. But back then, fitting all my music on an iPod was amazing.
The last iPod I ever bought was this one, an iPod nano 7th generation.
I bought it for my wedding back in 2015. I wanted to make sure our song for the first dance was not screwed up and I had zero faith the band could manage it without something as simple as an iPod. Because engraving was free, I added this touch.
It ended up being true. 🎶
This spring, my son changed his major to GIS. He almost did Geography, but I was able to convince him these computers are going to take off eventually and he needs to have something more modern. He’s finished up his first semester and he says he loves it. We’ll see!
I told him all that matters is he spends time on Python. I’ve already exposed him to it when his major was Biology, but I would wager it means even more for GIS. He starts some of these classes this fall so I can’t wait to see what they are teaching him at Arizona State’s School of Geographical Sciences & Urban Planning.
🗺
I’ve been thinking about how to start sharing content again. I have a couple ways in the past but they are all disjointed:
Spatially Adjusted - my blog I’ve been posting to since 2005. I feel like has done what it needed to do and I need to move on. Great content, but that’s not how I work anymore.
Spatial Tau - my newsletter. I’ve been hot and cold with this, I have some of the best reactions from people about it, but I don’t like the interface. I could move people over here but I don’t know if that is the best solution. Maybe content is needed.
Podcasts - I’ve had a couple; Hangouts with James Fee, Cageyjames and Geobabbler are probably the two biggest ones. I just haven’t had time to edit a podcast anymore.
Twitter - @cageyjames… enough said
I backed micro.blog years ago on Kickstarter and I’ve liked the interface, so maybe this is where I’ll try and post now. I reserve the right to change my mind.
Photo by JOHN TOWNER