Being a mobile developer you’d never imagine having to work with geospatial indexes and geolocation queries; real life has a different opinion though. In one of my recent projects, a dating app, we needed to be able to find users that were physically near each other. To accomplish this I summoned some of my old backend tricks and I wanted to share the finding with you.
There are several approaches for geolocation queries that vary in speed and accuracy, so let’s talk about these options. This post is aimed towards the mid-level developers, but hardcore geeks would also find interesting comparisons and approaches in the end of an article.

Routine tasks like “find all app users near city X” or “sort something by distance from current user” are really common requests. In one of the legacy projects I’ve been supporting, this task was solved in a pretty straightforward way. They simply used the following query to select coordinates fitting a necessary square
WHERE lat > 10 AND lat < 20 AND lng > 30 AND lng < 45
and then additionally filtered all founded results in Python script, using distance calculation. I’m sure you can imagine the performance of such an approach and the amount of edge conditions it had. Of course, all edge conditions could be resolved and carefully scaffolded with unit tests, but let’s look at how this could be done on a DB level. And no, we’re not doing that in order to shift the responsibility onto DBA.
The last time this strategy was used was during the development of our secure dating startup Zin; where geolocation was a critical feature. We needed to find all posts that were close to a user and sort them by distance. After researching it appeared we had 3 options (more precisely two and a half), which we’ll cover.
For clarification, I’ll say that we needed “big circle distance”, without terrain consideration. Also, a possible error was pretty big, so the main criteria for us was implementation simplicity. Looking ahead I can say that precision appeared to be bigger than we expected, and I’ll show that in practice.
Book a strategy session
Get actionable insights for your product
Success!
We’ll reach out to schedule a call
Test data
As our DA contains some sensitive data, I’ll use two other datasets to provide examples. First one, it’s a big table of cities and their coordinates taken from http://download.geonames.org/export/dump/. It’s a bit bloated, as it contains not only cities, but their districts, notable landmarks, and other stuff, but as the main goal here is an overall amount of data. So, here is the structure of the first table.
CREATE TABLE cities (
id integer DEFAULT nextval('public.cities_seq'::regclass) NOT NULL,
name character varying NOT NULL,
region character varying NOT NULL,
lat double precision NOT NULL,
lng double precision NOT NULL
);
ALTER TABLE ONLY cities
ADD CONSTRAINT cities_pkey PRIMARY KEY (id);
CREATE INDEX cities_name_low_idx ON cities USING btree
(lower((name)::text));
CREATE INDEX cities_location_idx ON cities USING gist (ll_to_earth(lat, lng));
It’s pretty simple: city name, code of the country it belongs and a pair of attributes to store latitude and longitude. Additionally, we’re building two indices: BTREE for fast search of city by name and the one we need for location queries. It uses GIST (huge thanks to PostgreSQL team for this magnificent mechanism) and ll_to_earth function that we’ll discuss later on.
Usually, we have two ways of storing information: use a pair of numbers (usually float8) in different columns or use one of the domain-specific types that we’re discussing later (point, earth, geometry). There is no viable difference in performance, so separate columns are more user options, but of course, it depends on the particular requirements of the project.
I won’t post the helper Python scripts I used for data collection and DB population, they are really straightforward. If you’d like to take a sneak peek at them, here is a GitHub repo: https://github.com/cleg/GeoReqTest
After transferring the whole dataset to PostgreSQL, let’s check that we have a big enough table to play with.
SELECT COUNT(*), pg_size_pretty(pg_relation_size('cities')) AS data, pg_size_pretty(pg_indexes_size('cities')) AS idxs
FROM cities;
count | data | idxs
----------+--------+---------
12005938 | 848 MB | 1836 MB
(1 row)
Time: 2073.919 ms (00:02.074)
The second dataset I created for a more visual demonstration of complicated queries contains two much smaller tables. First, cities_min contains the biggest cities of Montenegro (15 in total), the second one, named pizzerias, contains, you got it, list of local pizzerias (172 total) that I’ve got from FourSquare. This scraping script (pun intended) is also available in the abovementioned repo.
Now that we have the intro out of the way, let’s proceed to the main topic: discussing the options we have.
PostGIS
This is the “hard way” we wanted to avoid, as is hinted in the title. But I’ll be covering it as we need to know why it’s not the best choice for us.
Of course, this method has some really viable advantages: it’s the most versatile and precise, it has no edge conditions and it allows changing the spatial reference identifier.
But that comes with a big cost, starting with tons of heavy dependencies (simple installation via Homebrew requires installing more than 30 additional formulas, including mastodons like GCC, Qt, boost, etc.) and ending with overall complexity (non-standard data types, 5000+ rows of additional tables and views with dictionaries). Also, it’s slower than other methods, not by a big margin, but anyway, why spend resources for the precision we don’t need?
Luckily, the core team of PostgreSQL developers created a really convenient extension, containing the solution we used. Installation is really simple:
CREATE EXTENSION earthdistance CASCADE;
This extension gives two options for working with coordinates.
<@> operator
I’m not sure how to call this syntactic chimera (maybe “eye operator”?), but its essence is simple: it calculates that “big circle distance” I mentioned before; between two points on Earth’s surface. To represent coordinates it uses built-in point type, but pay attention that coordinates are represented as point(lng, lat) with longitude in the first place. Its counter-intuitive, as in speech we’re used to naming latitude first, but longitude is closer to X-axis by its nature, so it goes in the first place. The second peculiarity to be aware of is that this operator returns results in statute miles, so cases when you’ll need to multiply or divide by 1609, will be pretty numerous if you live outside of US/UK.
The advantages of this method are pretty obvious: it’s simple, and it uses basic datatypes of PostgreSQL. But it has a lot of drawbacks: precision (Earth considered being ideal sphere), edge conditions around the poles and 180 meridian, and bad indexing support. On small datasets and cases with far from edge conditions, this method is doing its job on an acceptable level.
Let’s see an example: we need to find places within 10 km from city Bar (yes, it’s a real name) having the following coordinates: 42.1° N, 19.1° E
SELECT *, (point(lng, lat) <@> point(19.1, 42.1)) * 1609.344 AS distance
FROM
cities
WHERE
(point(lng, lat) <@> point(19.1, 42.1)) < (10000 / 1609.344)
ORDER BY
distance;
id | name | lat | lng | region | distance
----------+-----------------------------+----------+----------+--------+--------------------
27696013 | Bjelisi | 42.10083 | 19.10417 | ME | 356.2025188423806
27699128 | Topolica | 42.09833 | 19.09417 | ME | 515.6034584176577
27696061 | Bar | 42.0937 | 19.09841 | ME | 712.7044816072382
27708312 | King Nikola's Palace | 42.10062 | 19.09129 | ME | 721.903811067934
27708268 | Princess | 42.10158 | 19.09132 | ME | 737.3598549548324
27708234 | Bar Port | 42.09701 | 19.09132 | ME | 789.5619694327141
27708184 | Castello | 42.10701 | 19.09623 | ME | 839.2352057529451
27699126 | Popovici | 42.09194 | 19.10528 | ME | 996.5017600203865
27708313 | Antivari Pier (historical) | 42.09708 | 19.08834 | ME | 1015.3313902853366
27695933 | Rikavac | 42.09278 | 19.08972 | ME | 1167.8829329656053
…
(232 rows)
Time: 2844.305 ms (00:02.844)
So, the query is pretty simple, but execution time looks pitiful, even without using EXPLAIN, obvious that here works plain SCAN. Of course, 3 seconds for 12M+ scan looks impressive, but it’s not usable for production. Usually, to optimize such queries we can use <-> operator that calculates “straight line” distance between points. It works with GIST indexes, and later we can refine results with <@>, but it’s not saving us from edge conditions anyway.
Do you know that feeling when the author of a book or an article pushes you towards the solution they consider “the most correct”? It’s the right time to recall it, as we’re close to the third (or, more precisely, second with a half, as it’s also declared in the earth distance extension) method.
earth_distance function and its little helpers
First of all, I’d like to thank the PostgreSQL development team and all its contributors. I think they are some of the brightest developers on the planet, and making this DB opensource makes them also the most generous ones. Why did I say so? Let’s see.
As we’ve seen above, “2D way” (<@> and <-> operators) doesn’t work perfectly. So what can we do? Shift to 3D of course, and that’s what earth_distance can do. With the help of a few additional functions, we can “translate” any pair of latitude+longitude figures into a three-dimensional point in space and use it for a distance computation without edge conditions. As PostgreSQL doesn’t have a data type for 3D point, here they used `cube` data type to represents 3D cubes. Points are represented as zero-sized cubes, simple and efficient. Creators defined domain-specific type earth that is a simple descendant of the cube type, making some additional sanity checks (like “this point isn’t far from the planet’s surface).
First, let’s see some useful functions.
- ll_to_earth(latitude, longitude) — translates latitude, longitude pair into earth type, calculating 3D point location
- earth_distance(point1, point2) — computes the distance between two points in earth format
- earth_box(point, radius) — creates a bounding cube, sized to contain all the points that are not farther than radius meters from a given point. This cube is an approximation, but we can use it for search with @> operator (containment). This operator efficiently supports GIST indexes, but we need to refine the result
Let’s see an example of a query similar to one we used for <@> demonstration.
SELECT *, ROUND(earth_distance(ll_to_earth(42.1, 19.1), ll_to_earth(lat, lng))::NUMERIC, 2) AS distance
FROM
cities
WHERE
earth_box(ll_to_earth (42.1, 19.1), 10000) @> ll_to_earth (lat, lng)
AND earth_distance(ll_to_earth (42.1, 19.1), ll_to_earth (lat, lng)) < 10000
ORDER BY
distance;
id | name | lat | lng | region | distance
----------+-----------------------------+----------+----------+--------+----------
27696013 | Bjelisi | 42.10083 | 19.10417 | ME | 356.60
27699128 | Topolica | 42.09833 | 19.09417 | ME | 516.18
27696061 | Bar | 42.0937 | 19.09841 | ME | 713.51
27708312 | King Nikola's Palace | 42.10062 | 19.09129 | ME | 722.72
27708268 | Princess | 42.10158 | 19.09132 | ME | 738.19
27708234 | Bar Port | 42.09701 | 19.09132 | ME | 790.45
27708184 | Castello | 42.10701 | 19.09623 | ME | 840.18
27699126 | Popovici | 42.09194 | 19.10528 | ME | 997.62
27708313 | Antivari Pier (historical) | 42.09708 | 19.08834 | ME | 1016.48
27695933 | Rikavac | 42.09278 | 19.08972 | ME | 1169.20
…
(232 rows)
Time: 102.084 ms
As you can see, performance is much better now. Even on 2016 MacBook Pro it took about 100 ms to process 12M+ of records. That’s what we can use in production code. Let’s double-check our query with EXPLAIN.
Gather Merge (cost=45002.12..45441.98 rows=3770 width=69)
Workers Planned: 2
-> Sort (cost=44002.10..44006.81 rows=1885 width=69)
Sort Key: (round((sec_to_gc(cube_distance('(4471920.234086516, 1548541.2222539173, 4276093.607391206)'::cube, (ll_to_earth(lat, lng))::cube)))::numeric, 2))
-> Parallel Bitmap Heap Scan on cities (cost=651.34..43899.55 rows=1885 width=69)
Recheck Cond: ('(4461920.235110745, 1538541.2232781458, 4266093.608415434),(4481920.233062288, 1558541.2212296887, 4286093.606366977)'::cube @> (ll_to_earth(lat, lng))::cube)
Filter: (sec_to_gc(cube_distance('(4471920.234086516, 1548541.2222539173, 4276093.607391206)'::cube, (ll_to_earth(lat, lng))::cube)) < '10000'::double precision)
-> Bitmap Index Scan on cities_location_idx (cost=0.00..650.21 rows=13572 width=0)
Index Cond: ((ll_to_earth(lat, lng))::cube <@ '(4461920.235110745, 1538541.2232781458, 4266093.608415434),(4481920.233062288, 1558541.2212296887, 4286093.606366977)'::cube)
Index scan + refinement gives us a good result. So, if your DB is big, it’s always a nice idea to narrow query bounds. Situations where you need to search the whole globe are really rare, most often we limit our search with some particular region (city, country, state).
You may have wondered, what is the order of inaccuracy we got with “approximation” methods? At least I was very curious about that. It’s really simple to find out.
SELECT name,
ST_Distance(ST_MakePoint(lng, lat)::geography, ST_MakePoint(19.1, 42.1)::geography) AS postgis_distance,
earth_distance(ll_to_earth(42.1, 19.1), ll_to_earth (lat, lng)) AS earth_distance,
(point(lng, lat) <@> point(19.1, 42.1)) * 1609.344 AS point_distance
FROM
cities
WHERE
earth_box(ll_to_earth (42.1, 19.1), 10000) @> ll_to_earth (lat, lng)
AND earth_distance(ll_to_earth (42.1, 19.1), ll_to_earth (lat, lng)) < 10000
ORDER BY
earth_distance;
name | postgis_distance | earth_distance | point_distance
-----------------------------+------------------+--------------------+--------------------
Bjelisi | 357.05152825 | 356.6040157475758 | 356.2025188423806
Topolica | 516.71294939 | 516.1846255398445 | 515.6034584176577
Bar | 712.02799946 | 713.5078129379874 | 712.7044816072382
King Nikola's Palace | 723.77941534 | 722.717511506365 | 721.903811067934
Princess | 739.14564214 | 738.1909768134576 | 737.3598549548324
Bar Port | 791.12180279 | 790.4519313791742 | 789.5619694327141
Castello | 838.76185601 | 840.1811573389859 | 839.2352057529451
Popovici | 996.13740888 | 997.6249760322966 | 996.5017600203865
Antivari Pier (historical) | 1017.61930422 | 1016.4758302850378 | 1015.3313902853366
Rikavac | 1168.91272962 | 1169.199322822384 | 1167.8829329656053
…
Knjezdovo | 9751.02688274 | 9743.710964250973 | 9732.740617250734
Rt Sapavica | 9776.31845857 | 9772.941912657994 | 9761.938654824216
Livrsanik | 9790.74772202 | 9781.953343427898 | 9770.939939713666
Kabeta | 9803.27894257 | 9791.177885397401 | 9780.154095863507
Pecurice | 9825.29031086 | 9833.21849793169 | 9822.147375291843
Dedici | 9825.38102702 | 9836.570130079455 | 9825.495233870668
Kozjak | 9860.53074801 | 9882.053049767033 | 9870.926944792083
Livari | 9899.30877216 | 9888.084255774567 | 9876.95136032431
Ocas | 9877.66304883 | 9893.775540351273 | 9882.636237141522
Kamisora | 9932.59217935 | 9926.356311450303 | 9915.180325874655
Lunje | 9955.46356913 | 9951.533852256645 | 9940.329519539344
Pepici | 9950.93625773 | 9971.099451151647 | 9959.873089720966
Bajraktari | 9993.82232318 | 9992.844754025287 | 9981.593909774985
(228 rows)
Time: 500.176 ms
As you can see, the error is about 30 meters for 10 km in the worst cases, which is more than acceptable in most scenarios. Actually, we were expecting a much larger error, and it was still acceptable for us.
Let’s summarise information about the “cubes method”. Advantages: speed, simplicity, possibility to “redefine” planet’s radius (helpful for non-Earth calculations of to re-define units of measurement). And only one disadvantage is left: precision is still a bit lower than the PostGIS one; as Earth still considered spherical. But as we’ve seen before, the error is on an acceptable level for most applications.
More complex queries
Now that we have learned the basics we can discuss more complicated queries. I will show a pair of examples, using a “smaller” dataset with pizzerias.
For example, what if we needed to find the farthest pizzerias and detect what cities they belong to. We’ll assume that pizzeria “belongs” to a city with the center nearest to the venue location. Of course, SQL has dozens of ways to express this query, but I’ll use a version that has the most straightforward logic of search.
WITH pizzerias AS (
SELECT
name,
ll_to_earth(lat, lng) as point,
earth_distance(ll_to_earth(lat, lng), ll_to_earth(42.1, 19.1)) AS distance
FROM pizzerias
ORDER BY
distance
DESC
LIMIT 25
)
SELECT c.name AS city, p.name, p.distance
FROM pizzerias p,
LATERAL (SELECT name FROM cities_min c
ORDER BY earth_distance(ll_to_earth(c.lat, c.lng), p.point) LIMIT 1) c;
city | name | distance
--------------+---------------------------+--------------------
Pljevlja | Brut | 141584.33308274313
Zabljak | Balkan | 117451.9785288297
Zabljak | Caffe pizzeria Balkan | 117435.32995569201
Zabljak | Lazar | 117308.9999099511
Bijelo Polje | cezar | 116663.29228130652
Bijelo Polje | Orao | 116642.62285195319
Bijelo Polje | Pierre | 116580.71776744149
Bijelo Polje | Padrino | 104133.00996894344
Kolasin | Padrino Pizzerua | 103643.11705954213
Kolasin | Pizzeria Kapri | 87709.3205690596
Niksic | Chaplin | 76339.67053537675
Niksic | Pizzeria Garfield | 76139.35825221038
Niksic | Stari Nikšić | 76118.70681853226
Niksic | Piazza Restaurant & Caffe | 76110.42052261815
Niksic | K2 | 75899.71658365549
Niksic | Šef | 75850.41193677587
Herceg Novi | Grifonee | 62681.009366742284
Herceg Novi | Nautilus | 62597.163060010935
Herceg Novi | Caffe Pizzeria Biblioteka | 62586.73859525146
Herceg Novi | Forte | 61781.28207372379
Herceg Novi | Atelier | 61018.18974275154
Herceg Novi | Popaj | 60869.123535469735
Herceg Novi | Cafe Pizzeria Trg | 60721.08839248005
Herceg Novi | Caffe Pizzeria 5 | 60702.71868706536
Herceg Novi | Pic Nic | 60417.41134618965
(25 rows)
Time: 8.487 ms
First, we select the farthest venues, using the distance sort in a descending order. Then we simply have to “traverse” this data, selecting the matching city for each row.
One more example, let’s find the most “pizza-loving cities”, assuming that such cities will have the biggest count of pizzerias in a 10 km radius from the center. A more logical approach would be to calculate the ratio of pizzerias counts to cities population, but I didn’t want to search for demographic data, so we’ll just count absolute value.
SELECT c.name, count(cp)
FROM cities_min c,
LATERAL (SELECT "name"
FROM pizzerias p
WHERE earth_distance(ll_to_earth(p.lat, p.lng), ll_to_earth(c.lat, c.lng)) < 10000
) AS cp
GROUP BY c.name
ORDER BY count(cp) DESC;
name | count
--------------+-------
Podgorica | 46
Tivat | 36
Kotor | 31
Budva | 23
Bar | 19
Herceg Novi | 16
Ulcinj | 16
Petrovac | 11
Niksic | 6
Cetinje | 6
Danilovgrad | 3
Bijelo Polje | 3
Zabljak | 3
Pljevlja | 1
Kolasin | 1
(15 rows)
Time: 65.703 ms
The results speak for themselves. Besides the capital (with the biggest population), coastal cities like pizza much more than mountain areas. Even trendy ski resorts don’t skew the data very much.

Conclusion
- PostgreSQL has two “built-in” ways of dealing with geolocation data: <@> operator and earth_distance functions family
- It’s better to use the second one, but they can be combined if necessary
- PostGIS is the most versatile and precise method, but it’s the slowest and the most complex one. So in the vast majority of cases, you can use a simpler method
- Narrowing your query to some bounding area is always a great idea from performance standpoint
Book a strategy session
Get actionable insights for your product
Success!
We’ll reach out to schedule a call