# tlohde

euclidean cities

00:25 14/05/2026
3249 words
contents

I am currently reading Mapmatics: How We Navigate the World Through Numbers by Paulina Rowińska.[1]

Chapter 4 — Distanced — starts by discussing homeomorphic representations that distort distances but preserve connections, such as Harry Beck's map of the London Underground. This leads into a brief section critiquing the isochrones on the maps dotted across London that aim to depict where you can walk to in 5 or 15 minutes, but actually show where a crow could fly to, because they don't pay any attention to the street network.

In some silly little corner of my silly little head, a silly little seed was planted.

By the time I'd finished reading the next section where euclidean distance and manhattan distance were introduced, that seed had taken root and grown into a fully formed silly little question.

what's the most euclidean city?

Buildings and railways and rivers and churches and other people's homes all have a habit of getting in the way. From where I am currently sitting, it is a mere ~400 metres to a bakery that'll bankrupt you as fast as you can say "two cinnamon buns please", but to get there I have to walk ~800 m. That's a ratio of 2.

From the train station to the climbing gym I used to frequent it's a short 2.15 km walk. But that's still 1.5 times further than the straight line distance of 1.44 km.

The aquarium is only...you get the idea.

A city that had straight roads from everywhere to everywhere else would be perfectly euclidean (and weird) (Fig 0d). I don't think such a city has been constructed, so where's the next best place? Which city on average has the lowest ratio of network distance to euclidean distance? Or, to twist and frame this question in a way that makes it sound slightly more useful, which city is most accommodating to the reluctant and lazy pedestrian or tired cycle courier?

a four panel figure showing Ljublana's street network. (a) nodes in the network; (b) the network as it is; (c) the network with straight edges; (d) edges joining every node
Fig 0: Ljubljana, Slovenia: 250 m radius. Left to right, top to bottom: (a) all nodes in street network (n_nodes=184); (b) the street network as it is (n_edges=448); (c) the network as it would be if all streets were straight; (d) the perfect network where one would never have to travel further than neccessary (n_edges=16,836 ((n_nodes * n_nodes-1) / 2 )).

literature review

I'm saving the reading until after I've done this, because I don't want to discover that someone else has already done this, and in the way that is better than I am possibly capable of. That would be demoralising, and also rob me of the joy of mucking about with: (a) an idea; (b) some data; and (c) a new-to-me plotting library.

This may become a future post.

method

data

Representative points for each european capital were taken from OpenStreetMap, and for each city, a 3 km buffer was drawn and the street network within this buffer extracted using the python library osmnx. The networks were reprojected into the local UTM zone and some network summary statistics calculated with osmnx.stats.basic_stats.

show the code
basic_stats = []
for capital in tqdm(capitals.itertuples()):
    name = '_'.join(capital.name.split(' '))
    
    digraph = ox.graph_from_point(
        center_point=(capital.geometry.y, capital.geometry.x),
        dist=3_000,
        network_type='walk',
        simplify=True,
    )

    proj_graph = ox.projection.project_graph(digraph)

    ox.io.save_graphml(
        proj_graph,
        f"graphs/{name}.graphml"
        )
    basic_stats.append(
      {capital.name: ox.stats.basic_stats(proj_graph)}
    )

stats_df = pd.concat([pd.DataFrame.from_dict(s) for s in basic_stats], axis=1).T
stats_df.to_json('basic_stats.json')

sampling & routing

To estimate the network's average ratio of network distance to euclidean distance, pairs of nodes were drawn at random (without replacement) from an area within 1500 m of each city centre point, and both the shortest network path (osmnx.routing.shortest_path) and the straight line distance between them calculated. The motivation behind sampling nodes from a smaller area was to (a) mitigate the shortest route being via a node outside of the 3 km zone originally selected; and (b) keep it to an area that I assumed[2] would be where rates of pedestrianism are greatest. In the spirit of not letting my laptop get too hot and bothered[3], only 10% of the nodes were sampled from each city. This proportional sampling means that the number of nodes sampled, and subsequently the number of ratios calculated, varied for each city (table 0).

show the code
def sample_network(capital, bufferdist=1500, samplefrac=0.05):
    
    name = '_'.join(capital.name.split(' '))

    graph = ox.io.load_graphml(f"graphs/{name}.graphml")
    all_nodes, all_edges = ox.convert.graph_to_gdfs(graph)
    crs = all_nodes.crs

    # select and sample from nodes within `bufferdist` of capital point
    inner_buffer = (
        capitals.loc[capitals['name']==capital.name]
        .to_crs(crs)
        .buffer(bufferdist)
        .iloc[0]
    )

    inner_nodes = all_nodes[
        ~all_nodes.intersection(inner_buffer).is_empty
    ].index.tolist()

    n_samples = int(len(inner_nodes) * samplefrac)

    linestrings = []
    for _ in range(n_samples):
        # pop used nodes
        orig = inner_nodes.pop(random.randrange(len(inner_nodes)))
        dest = inner_nodes.pop(random.randrange(len(inner_nodes)))

        # but allow routing across whole graph - so route can leave
        # the inner_buffer if needed
        route = ox.routing.shortest_path(
            graph,
            orig,
            dest,
            weight='length'
        )

        route_gdf = ox.routing.route_to_gdf(
            graph,
            route,
            weight='length'
        )

        # merge route edges to single linestring
        ls = line_merge(MultiLineString(route_gdf['geometry'].tolist()))
        linestrings.append(ls)

    # make some stats, return reprojcted
    gdf = gpd.GeoDataFrame(geometry=linestrings, crs=crs)
    gdf['network'] = gdf.length
    gdf['euclidean'] = gdf.boundary.apply(
      lambda mp: mp.geoms[0].distance(mp.geoms[1])
      )
    gdf['ratio'] = gdf['network'] / gdf['euclidean']
    gdf['city'] = capital.name
    gdf['utm_crs'] = crs.to_epsg()
    
    return gdf.to_crs(4326)

And I think that's ok[4].

Table 0: number of node pairs sampled per city
show the table
cityn
Ankara72
Valletta76
Vaduz76
City of San Marino82
Andorra la Vella87
Bucharest136
Podgorica149
Nicosia154
Skopje155
Monaco157
Amsterdam164
Sarajevo167
Budapest169
Lisbon171
Pristina171
Luxembourg200
Chișinău201
Stockholm207
Belgrade208
Reykjavik231
Vatican City243
Athens247
Copenhagen261
Sofia264
Madrid266
Zagreb268
Dublin275
Kyiv276
Prague278
Rome284
Vilnius293
Riga294
Bern299
Ljubljana318
Vienna330
Oslo339
Brussels343
Tallinn345
Tirana347
Minsk360
London375
Berlin375
Bratislava381
Paris400
Moscow490
Warsaw517
Helsinki848

statistics

To ascertain whether or not any differences between two given cities' street network euclidean-ness is statistically significant some statistics were done...

This involves comparing the distribution of network:euclidean ratios for a given city (e.g. Fig 1). ANOVA, or analysis of variance is the tool for the job. However it requires few assumptions to be met:

if you can read this, the figure hasn't loaded. sorry
Fig 1: Distribution of network:euclidean distance ratios across Ljubljana.

Independence is a safe assumption. These are different cities, separated by some distance, in different geographical and geological settings. Sure some urban planning practices, designs and themes might have been copied here and there, but, whatever. Check.

Normality was tested with scipy.stats.normaltest and they all came back as normal enough.[6] Check.

Homoscedasticity. Fail. scipy.stats.bartlett suggested that these samples are unlikely to have equal variances.

So, no ANOVA today. Kruskal-Wallis, however, can be used as a substitute as it operates on the ranks of the data, rather than the data itself.

Kruskal-Wallis will only tell you if there is a difference between groups, but it won't tell you which cities are different. For that you need a bit of post-hoc analysis.[7] Pairwise comparisons were carried out using Conover's test as implemented in scikit_posthocs with step-down Bonferroni adjustments to account for there being multiple comparisons.

results

the headline

Madrid is the most euclidean; Valletta the least.

the detail

They are different (Fig 2). Well, some of them are at least. It is a spectrum. You could even say there is spatial heterogeneity.[8]

Madrid, Brussels, Paris, and Oslo are the most euclidean, apparently, with median ratios of 1.195, 1.216, 1.222, and 1.228, respectively, and if Conover's test is to be believed, they are not significantly different from one another (p-value > whatever threshold you fancy, 0.05, 0.001) (Fig 2). At the other extreme: Valetta (2.440), Vaduz (1.397), and Budapest (1.375) were the least euclidean, and can all be considered similarly inefficient.

if you can read this, the figure hasn't loaded. sorry
Fig 2: Heatmap of post-hoc pairwise comparisons using Conover's test. Small p-values allow us to reject the null hypothesis, and conclude that there is a difference between any given pair of cities, i.e., differences can be found in the purples around the edge of the figure. note: the colour scale is logarithmic.

Cities with larger average ratios also had greater variance[9] (Fig 3). Even within the most spaghetti-like city, some routes from will still be relatively direct, for example, in Stockholm you could get from the junction of Odengatan and Birger Jarlsgatan to the center of Svensksundsparken on Skeppsholmen traveling only 9.7% further than a bird (euclidean: 2683 m, network: 2942 m; ratio: 1.097). It is worth noting that this is largely independent of distance, i.e. it is not just a few long and wiggly routes that drive the greater variance. There are long direct routes, and short circuitous ones. For each city the ratio was hastily regressed against euclidean distance, and no consistent pattern emerged, with r-squared values typically around ~0.09.

if you can read this, the figure hasn't loaded. sorry
Fig 3: boxplot of network : euclidean distance ratios for each city, sorted by median. Note the logarithmic x-scale. Hover over a box for maximum, minimum, median and IQR values (given to too many decimal places for no good reason)

A quick comparision with all the statistics generated by osmnx.stats.basic_stats shows, mercifully, that I didn't re-invent the wheel here.[10] Average street circuity is the metric with the greatest similarlity to the ratio calculated here, and whilst they are correlated (r=0.33), they are different. ish. Andorra, with its hairpins tops the circuity charts, whereas Budapest is amongst the least. The circuity average is the result of averaging the circuity[11] for each edge, whereas the ratio calculated here is for a route across the network, involving multiple edges. It is understandable that these are different, since the angle at which streets intersect is not being accounted for in the former, whereas it is implicity included in the latter.[12]

why are the cities different?

To understand the differences in network : euclidean distance ratios between cities, it is neccessary to view them in situ, in context (Fig 4).

if you can read this, the figure hasn't loaded. sorry
Fig 4: Map showing routes between random nodes (solid lines) and the associated straight line paths (dashed). Hover over a route to see more details (note: out of respect for your internet speed, only 20% of samples are shown for each city)

The shortest, simplest, answer to explain away any differences is: water.

Valletta, the least euclidean in this cough "study", straddles two harbours[13]; the Danube cuts Budapest in two, and it's wide (~350 m ish), and the (two) bridges (that fall within my little study area) are a kilometre apart. Whereas Paris, which is admirably euclidean, despite being centred on Île de la Cité in the Seine, manages to squeeze in 10 bridges in ~ 3km. Stockholm's numerous islands and the-opposite-of-numerous bridges, means it has a relatively high median ratio (1.31), and a long tail.

A slightly longer, slightly more nuanced, answer is: water and hills. Vaduz illustrates this most clearly, with its hairpins on Fürst-Franz-Josef-Strasse. Andorra la Vella also has it's fair share of wiggles. But they're both tiny, and that will be discussed shortly.

Brussels, Madrid, Sofia and Oslo have the tightest inter-quartile ranges, and Athens, Madrid, Ankara and Rome have the highest average of streets per node. With no obvious relationship to be found here.

The qualities that can best tease apart the subtle and not-so differences, are perhaps those that can't readily be described by a quantity[14]: design and history. This is where I would like to talk about Haussmann's renovation of Paris, or how apt it is that Berlin and Helsinki are adjacent to one another in Fig 3 since it was Carl Ludvig Engel, who was trained as a surveyor at the former before becoming the architect charged with reconstructing the latter. But I shalln't because I'm not remotely qualified to, and I respect you more than to just regurgitate wikipedia at you. Mwah.

but

No self-respecting[15] cough "study" would be complete[16] without a limitations and caveats section. This is that.

sensitive

In version 1[17] of this analysis, the sampling strategy differed in two key ways: firstly a, larger, 5 km radius was used instead of 3 km; and secondly nodes were sampled from anywhere within that 5 km, as opposed to the inner 1.5 km done here. Those results looked different. Stockholm was the least euclidean, Paris the most (with a median ratio of 1.14, 0.05 less than Madrid scored in this version). Budapest was somewhere in the middle, as was Vienna.

The motivation behind shrinking the area, in addition to the reasons in the method, was to allow for smaller capitals (Andorra la Vella, Vaduz), to stand a fighting chance of being treated fairly. An option that was considered was to take the whole area of the capital, but for London would that mean taking all of Greater London, which feels wrong, and would include Biggin Hill, or just the City of London, which also feels wrong. Applying a blanket radius was simpler than selecting the areas individually, manually.

So, treat these "results" accordingly.

perhaps wrong

When having a look at some of these results, Belgrade's anomalously long tail stood out. The Sava can be crossed as a pedestrian on both the Brankov and Gazela bridges, and OpenStreetMap shows that to be the case. However it seems that the network is either (a) being incorrectly extracted from OpenStreetMap in the first instance; or (b) the bridges are lost during simplification process. As such, many routes that cross the Sava are being heavily penalised as they detour South to the Ada Bridge.

So, treat these "results" accordingly.

likely incomplete

The search space isn't so large that an option would be to skip the sampling altogether, and do either every pair, or pick a single node at the center and calculate network distances from that node to every other node (networkX has a function for just that).

conclusion

You're probably walking ~1.3x further than a crow is flying. On average. ish.

footnotes


  1. minor grumble: the UK edition has the least interesting cover-art. the german and italian covers are great. ↩︎

  2. not very science-y of me, sorry ↩︎

  3. coupled with some impatience, and not wanting to have to think too hard about writing efficient code ↩︎

  4. as does the friend who i asked ↩︎

  5. this is where i could have included a definition, but chose not to. sorry, you're on your own ↩︎

  6. (ed.) the same cannot be said of the author ↩︎

  7. while being mindful that at some point false positives are going to pop-up ↩︎

  8. small grievance: studies that state this like it is surprising. what would be surprising if everything was the same everywhere ↩︎

  9. i.e. a bigger coefficient of variation ↩︎

  10. i sort of did. but that's ok. wheels are great ↩︎

  11. well, duh ↩︎

  12. fighting the urge to scratch the what-angle-do-streets-typically-meet-at itch ↩︎

  13. and efficiently highlights a shortcoming of this cough "study": there are ferries ↩︎

  14. unfortunately ↩︎

  15. not that i, or this study are capable of that ↩︎

  16. this also isn't ↩︎

  17. yes, believe it or not this is the improved version 2 ↩︎