Bienvenue sur PostGIS.fr

Bienvenue sur PostGIS.fr , le site de la communauté des utilisateurs francophones de PostGIS.

PostGIS ajoute le support d'objets géographique à la base de données PostgreSQL. En effet, PostGIS "spatialise" le serverur PostgreSQL, ce qui permet de l'utiliser comme une base de données SIG.

Maintenu à jour, en fonction de nos disponibilités et des diverses sorties des outils que nous testons, nous vous proposons l'ensemble de nos travaux publiés en langue française.

source: trunk/workshop-foss4g/joins.rst @ 19

Revision 1, 11.6 KB checked in by djay, 13 years ago (diff)

Initial import of the svn tree

RevLine 
[1]1.. _joins:
2
3Section 12: Spatial Joins
4=========================
5
6Spatial joins are the bread-and-butter of spatial databases.  They allow you to combine information from different tables by using spatial relationships as the join key.  Much of what we think of as "standard GIS analysis" can be expressed as spatial joins.
7
8In the previous section, we explored spatial relationships using a two-step process: first we extracted a subway station point for 'Broad St'; then, we used that point to ask further questions such as "what neighborhood is the 'Broad St' station in?"
9
10Using a spatial join, we can answer the question in one step, retrieving information about the subway station and the neighborhood that contains it:
11
12.. code-block:: sql
13
14  SELECT
15    subways.name AS subway_name,
16    neighborhoods.name AS neighborhood_name,
17    neighborhoods.boroname AS borough
18  FROM nyc_neighborhoods AS neighborhoods
19  JOIN nyc_subway_stations AS subways
20  ON ST_Contains(neighborhoods.the_geom, subways.the_geom)
21  WHERE subways.name = 'Broad St';
22
23:: 
24
25   subway_name | neighborhood_name  |  borough 
26  -------------+--------------------+-----------
27   Broad St    | Financial District | Manhattan
28
29We could have joined every subway station to its containing neighborhood, but in this case we wanted information about just one.  Any function that provides a true/false relationship between two tables can be used to drive a spatial join, but the most commonly used ones are: :command:`ST_Intersects`, :command:`ST_Contains`, and :command:`ST_DWithin`.
30
31Join and Summarize
32------------------
33
34The combination of a ``JOIN`` with a ``GROUP BY`` provides the kind of analysis that is usually done in a GIS system.
35
36For example: **"What is the population and racial make-up of the neighborhoods of Manhattan?"** Here we have a question that combines information from about population from the census with the boundaries of neighborhoods, with a restriction to just one borough of Manhattan.
37
38.. code-block:: sql
39
40  SELECT
41    neighborhoods.name AS neighborhood_name,
42    Sum(census.popn_total) AS population,
43    Round(100.0 * Sum(census.popn_white) / Sum(census.popn_total),1) AS white_pct,
44    Round(100.0 * Sum(census.popn_black) / Sum(census.popn_total),1) AS black_pct
45  FROM nyc_neighborhoods AS neighborhoods
46  JOIN nyc_census_blocks AS census
47  ON ST_Intersects(neighborhoods.the_geom, census.the_geom)
48  WHERE neighborhoods.boroname = 'Manhattan'
49  GROUP BY neighborhoods.name
50  ORDER BY white_pct DESC;
51
52::
53
54   neighborhood_name  | population | white_pct | black_pct
55 ---------------------+------------+-----------+-----------
56  Carnegie Hill       |      19909 |      91.6 |       1.5
57  North Sutton Area   |      21413 |      90.3 |       1.2
58  West Village        |      27141 |      88.1 |       2.7
59  Upper East Side     |     201301 |      87.8 |       2.5
60  Greenwich Village   |      57047 |      84.1 |       3.3
61  Soho                |      15371 |      84.1 |       3.3
62  Murray Hill         |      27669 |      79.2 |       2.3
63  Gramercy            |      97264 |      77.8 |       5.6
64  Central Park        |      49284 |      77.8 |      10.4
65  Tribeca             |      13601 |      77.2 |       5.5
66  Midtown             |      70412 |      75.9 |       5.1
67  Chelsea             |      51773 |      74.7 |       7.4
68  Battery Park        |       9928 |      74.1 |       4.9
69  Upper West Side     |     212499 |      73.3 |      10.4
70  Financial District  |      17279 |      71.3 |       5.3
71  Clinton             |      26347 |      64.6 |      10.3
72  East Village        |      77448 |      61.4 |       9.7
73  Garment District    |       6900 |      51.1 |       8.6
74  Morningside Heights |      41499 |      50.2 |      24.8
75  Little Italy        |      14178 |      39.4 |       1.2
76  Yorkville           |      57800 |      31.2 |      33.3
77  Inwood              |      50922 |      29.3 |      14.9
78  Lower East Side     |     104690 |      28.3 |       9.0
79  Washington Heights  |     187198 |      26.9 |      16.3
80  East Harlem         |      62279 |      20.2 |      46.2
81  Hamilton Heights    |      71133 |      14.6 |      41.1
82  Chinatown           |      18195 |      10.3 |       4.2
83  Harlem              |     125501 |       5.7 |      80.5
84
85
86What's going on here? Notionally (the actual evaluation order is optimized under the covers by the database) this is what happens:
87
88#. The ``JOIN`` clause creates a virtual table that includes columns from both the neighborhoods and census tables.
89#. The ``WHERE`` clause filters our virtual table to just rows in Manhattan.
90#. The remaining rows are grouped by the neighborhood name and fed through the aggregation function to :command:`Sum()` the population values.
91#. After a little arithmetic and formatting (e.g., ``GROUP BY``, ``ORDER BY``) on the final numbers, our query spits out the percentages.
92
93.. note:: 
94
95   The ``JOIN`` clause combines two ``FROM`` items.  By default, we are using an ``INNER JOIN``, but there are four other types of joins. For further information see the `join_type <http://www.postgresql.org/docs/8.1/interactive/sql-select.html>`_ definition in the PostgreSQL documentation.
96
97We can also use distance tests as a join key, to create summarized "all items within a radius" queries. Let's explore the racial geography of New York using distance queries.
98
99First, let's get the baseline racial make-up of the city.
100
101.. code-block:: sql
102
103  SELECT
104    100.0 * Sum(popn_white) / Sum(popn_total) AS white_pct,
105    100.0 * Sum(popn_black) / Sum(popn_total) AS black_pct,
106    Sum(popn_total) AS popn_total
107  FROM nyc_census_blocks;
108
109:: 
110
111        white_pct      |      black_pct      | popn_total
112  ---------------------+---------------------+------------
113   44.6586020115685295 | 26.5945063345703034 |    8008278
114
115
116So, of the 8M people in New York, about 44% are "white" and 26% are "black".
117
118Duke Ellington once sang that "You / must take the A-train / To / go to Sugar Hill way up in Harlem." As we saw earlier, Harlem has far and away the highest African-American population in Manhattan (80.5%). Is the same true of Duke's A-train?
119
120First, note that the contents of the ``nyc_subway_stations`` table ``routes`` field is what we are interested in to find the A-train. The values in there are a little complex.
121
122.. code-block:: sql
123
124  SELECT DISTINCT routes FROM nyc_subway_stations;
125 
126:: 
127
128 A,C,G
129 4,5
130 D,F,N,Q
131 5
132 E,F
133 E,J,Z
134 R,W
135
136.. note::
137
138   The ``DISTINCT`` keyword eliminates duplicate rows from the result.  Without the ``DISTINCT`` keyword, the query above identifies 491 results instead of 73.
139   
140So to find the A-train, we will want any row in ``routes`` that has an 'A' in it. We can do this a number of ways, but today we will use the fact that :command:`strpos(routes,'A')` will return a non-zero number if 'A' is in the routes field.
141
142.. code-block:: sql
143
144   SELECT DISTINCT routes
145   FROM nyc_subway_stations AS subways
146   WHERE strpos(subways.routes,'A') > 0;
147   
148::
149
150  A,B,C
151  A,C
152  A
153  A,C,G
154  A,C,E,L
155  A,S
156  A,C,F
157  A,B,C,D
158  A,C,E
159 
160Let's summarize the racial make-up of within 200 meters of the A-train line.
161
162.. code-block:: sql
163
164  SELECT
165    100.0 * Sum(popn_white) / Sum(popn_total) AS white_pct,
166    100.0 * Sum(popn_black) / Sum(popn_total) AS black_pct,
167    Sum(popn_total) AS popn_total
168  FROM nyc_census_blocks AS census
169  JOIN nyc_subway_stations AS subways
170  ON ST_DWithin(census.the_geom, subways.the_geom, 200)
171  WHERE strpos(subways.routes,'A') > 0;
172
173::
174
175        white_pct      |      black_pct      | popn_total
176  ---------------------+---------------------+------------
177   42.0805466940877366 | 23.0936148851067964 |     185259
178
179So the racial make-up along the A-train isn't radically different from the make-up of New York City as a whole.
180
181Advanced Join
182-------------
183
184In the last section we saw that the A-train didn't serve a population that differed much from the racial make-up of the rest of the city. Are there any trains that have a non-average racial make-up?
185
186To answer that question, we'll add another join to our query, so that we can simultaneously calculate the make-up of many subway lines at once. To do that, we'll need to create a new table that enumerates all the lines we want to summarize.
187
188.. code-block:: sql
189
190    CREATE TABLE subway_lines ( route char(1) );
191    INSERT INTO subway_lines (route) VALUES
192      ('A'),('B'),('C'),('D'),('E'),('F'),('G'),
193      ('J'),('L'),('M'),('N'),('Q'),('R'),('S'),
194      ('Z'),('1'),('2'),('3'),('4'),('5'),('6'),
195      ('7');
196
197Now we can join the table of subway lines onto our original query.
198
199.. code-block:: sql
200
201    SELECT
202      lines.route,
203      Round(100.0 * Sum(popn_white) / Sum(popn_total), 1) AS white_pct,
204      Round(100.0 * Sum(popn_black) / Sum(popn_total), 1) AS black_pct,
205      Sum(popn_total) AS popn_total
206    FROM nyc_census_blocks AS census
207    JOIN nyc_subway_stations AS subways
208    ON ST_DWithin(census.the_geom, subways.the_geom, 200)
209    JOIN subway_lines AS lines
210    ON strpos(subways.routes, lines.route) > 0
211    GROUP BY lines.route
212    ORDER BY black_pct DESC;
213
214::
215
216     route | white_pct | black_pct | popn_total
217    -------+-----------+-----------+------------
218     S     |      30.1 |      59.5 |      32730
219     3     |      34.3 |      51.8 |     201888
220     2     |      33.6 |      45.5 |     535414
221     5     |      32.1 |      45.1 |     407324
222     C     |      41.3 |      35.9 |     430194
223     4     |      34.7 |      30.9 |     328292
224     B     |      36.1 |      30.6 |     261186
225     Q     |      52.9 |      26.3 |     259820
226     J     |      29.5 |      23.6 |     126764
227     A     |      42.1 |      23.1 |     370518
228     Z     |      29.5 |      21.5 |      81493
229     D     |      39.8 |      20.9 |     233855
230     G     |      44.8 |      20.0 |     138602
231     L     |      53.9 |      17.1 |     104140
232     6     |      52.7 |      16.3 |     257769
233     1     |      54.8 |      12.6 |     659028
234     F     |      60.0 |       8.6 |     438212
235     M     |      50.0 |       7.8 |     166721
236     E     |      69.4 |       5.3 |      86118
237     R     |      57.7 |       4.8 |     389124
238     7     |      42.4 |       3.8 |     107543
239
240
241As before, the joins create a virtual table of all the possible combinations available within the constraints of the ``JOIN ON`` restrictions, and those rows are then fed into a ``GROUP`` summary. The spatial magic is in the ``ST_DWithin`` function, that ensures only census blocks close to the appropriate subway stations are included in the calculation.
242
243Function List
244-------------
245
246`ST_Contains(geometry A, geometry B) <http://postgis.org/docs/ST_Contains.html>`_: Returns true if and only if no points of B lie in the exterior of A, and at least one point of the interior of B lies in the interior of A.
247
248`ST_DWithin(geometry A, geometry B, radius) <http://postgis.org/docs/ST_DWithin.html>`_: Returns true if the geometries are within the specified distance of one another.
249
250`ST_Intersects(geometry A, geometry B) <http://postgis.org/docs/ST_Intersects.html>`_: Returns TRUE if the Geometries/Geography "spatially intersect" - (share any portion of space) and FALSE if they don't (they are Disjoint).
251
252`round(v numeric, s integer) <http://www.postgresql.org/docs/7.4/interactive/functions-math.html>`_: PostgreSQL math function that rounds to s decimal places
253
254`strpos(string, substring) <http://www.postgresql.org/docs/current/static/functions-string.html>`_: PostgreSQL string function that returns an integer location of a specified substring.
255
256`sum(expression) <http://www.postgresql.org/docs/8.2/static/functions-aggregate.html#FUNCTIONS-AGGREGATE-TABLE>`_: PostgreSQL aggregate function that returns the sum of records in a set of records.
257
258.. rubric:: Footnotes
259
260.. [#PostGIS_Doco] http://postgis.org/documentation/manual-1.5/
261
Note: See TracBrowser for help on using the repository browser.