Alexander Klapheke

I’m a data scientist and self-taught coder (Python, R, Haskell) with a background in linguistics and math and over 10 years’ experience explaining technical concepts to laypeople.

Cambridge street trees

2,139 words CC-BY 📎 code View source

This project now features an interactive map.

I think that I shall never see
A billboard lovely as a tree.
Indeed, unless the billboards fall
I’ll never see a tree at all.

1 Introduction

Every spring, the Callery pear down the street from me displays a magnificent panoply of white blossoms, and releases… let’s say a distinct odor. A few months later, the tree of heaven in my backyard drops an incredible torrent of flowers onto my porch, and within a few months, another torrent of seed pods (this tree has it’s own smell which is marginally less unpleasant). It’s a subtle difference between a street lined with, say, honeylocusts and one lined with sycamores, but trees really do go a long way in defining the character of a place, making them a crucial consideration in urban planning.

Like many cities, Cambridge has long maintained a dataset of all municipal trees, including the species, location, date planted, and diameter.1 By exploring this data, we can see what trends and patterns emerge, and learn something about the arboreal life of our cities.

2 Cleaning

Although the dataset does seem to cover every public tree, much of the data is missing. For example, 83% of trees are missing the date they were planted, presumably at least in part because they predate the record system (a few of those that do list dates are from the 1970s).

There are many spurious values as well. Besides misaligned species names (to be discussed in sec. 3), there are several misspelled genera (e.g., “Planatus” for Platanus). The diameter of each tree is listed, but the highest values are 154, 715, 745, 915, and 945 inches, corresponding respectively to about 13, 60, 62, 76, and 79 feet (the stoutest tree in the world is only 46 feet in diameter), casting all values into some suspicion. The dataset also lists properties such as number of trunks—one katsura tree has ten, and two, a Japanese tree lilac and a serviceberry have eleven—and things of municipal importance, such as whether there are overhead wires, or whether the roots are emerging through the sidewalk.

I also included data about invasiveness, by grabbing the list of species considered invasive in Massachusetts, copying the scientific names to a text file, and merging.

The SiteType column helpfully tells us whether the tree is, in fact, a tree at all, or a stump, a plot being prepared for a tree, etc. After cleaning, I removing all of the latter cases—although really, the cleaning was a process ongoing until the end.

3 Varieties

The obvious first thing to do after cleaning is look at what varieties line the city’s streets. fig. 1 shows the most popular genera as recorded in the dataset (their corresponding common names are in Appendix A).

Figure 1: The top 20 genera of street tree in Cambridge
Figure 1: The top 20 genera of street tree in Cambridge

However, just knowing that the streets are lined with maples (Acer spp.) doesn’t tell us if they are towering, stately silver maples (A. saccharinum), or splendiferous red maples (A. rubrum), or invasive Norway maples (A. platanoides), nor does knowing there are over 3,000 oaks (Quercus spp.) cast light on which of the over 600 species they comprise. We can break this graph down further by species (fig. 2).

Figure 2: The top 20 genera of street tree in Cambridge, broken down by species (hover over each bar to see the specific name)
Figure 2: The top 20 genera of street tree in Cambridge, broken down by species (hover over each bar to see the specific name)

There are some interesting takeaways here. For instance, not a single public apple tree (genus Malus) has been identified for species. This indicates that the data was not logged at the time of planting; possibly, the species were identified by sight in a later survey (ornamental apple trees are typically hard-to-identify hybrids). Some of the “unknown species” labels are clearly in error. Liquidambar, for example, contains only four species, only one of which (L. styraciflua) is grown ornamentally in the U.S., and is easily distinguished from its relatives by its five-pointed leaves.2 More alarmingly, Ginkgo biloba is the only extant species in its entire class (a sister clade to the conifers), but 79 specimens are missing species information, and one is apparently of Ginkgo triacanthos! Presumably, city arborists have not discovered a new living fossil, but rather this row has gotten mixed up somehow (maybe during merging) with a honeylocust, Gleditsia triacanthos. We also see such botanical novelties as Ulmus cordata, Tilia calleryana, and, amusingly, Acer acerifolia—literally, a maple-leafed maple.

Besides the missing and spurious data, we see that over a third of city-planted maples are Norway maples, which are now illegal in Massachusetts due to their propensity to crowd out native species. In all, 2,282 trees considered invasive, representing five species, are present in the dataset.

Invasive plants are not the only menace to biodiversity. We are lucky to have over a thousand elms in Cambridge—after Dutch elm disease ravaged populations in the 1960s and ’70s, disease-resistant cultivars of Ulmus americana were developed which, along with fungicide, managed the destruction. The chestnut blight fungus, introduced from East Asia around the turn of the 20th century, has made the American chestnut (Castanea dentata) all but extinct, with a onetime population of several billion dwindling to a few hundred today. So I was surprised to see four trees in the genus Castanea in the dataset, but less surprised when three turned out to be misidentified horse chestnuts (Aesculus hippocastanum),3 and the fourth was… wait for it… a Norway maple.

4 Mapping

Every tree in the dataset has an associated latitude and longitude. Indeed, the coordinates given for each tree display incredible precision, giving no fewer than 15 decimal places, making them accurate to about 100 picometers, or approximately an atomic radius (I have no idea how they chose which atom of the tree to use as a reference). In order to map trees against Cambridge’s basemap shapefiles, I used PyShp to parse the shapefiles and PyCRS and PyProj to convert from WGS84 (i.e., standard latitude and longitude) to NAD 1983, the coordinate system used in the shapefiles.

In fig. 3, we can see the locations of invasive species of tree. Norway maples (Acer platanoides) are predictably strewn across the city, but we see clusters of trees of heaven (Ailanthus altissima), and a few sizeable black locusts (Robinia pseudoacacia), their impressive diameters indicating older trees. Of course, many more specimens of these noxious species exist on private land, and many of the worst invasive species are not trees at all, such as reeds (Phragmites australis) and garlic mustard (Alliaria petiolata).

Figure 3: Locations of public trees in Cambridge considered invasive. Marker size is proportional to tree diameter.   Acer platanoides (2,121)  Robinia pseudoacacia (55)  Phellodendron amurense (54)  Ailanthus altissima (50)  Rhamnus cathartica (2)
Figure 3: Locations of public trees in Cambridge considered invasive. Marker size is proportional to tree diameter.

 Acer platanoides (2,121)
 Robinia pseudoacacia (55)
 Phellodendron amurense (54)
 Ailanthus altissima (50)
 Rhamnus cathartica (2)

To see what trees are popular in different areas, I downloaded the neighborhood boundary shapefiles, and used Shapely to categorize each latitude–longitude pair as being in one of the resulting polygons (dismayingly, although the dataset has a Neighborhood column, it doesn’t contain a single data point). By filtering, we can see what the most popular species is in each neighborhood.

Figure 4: Most popular tree per neighborhood.   Gleditsia triacanthos  Acer (unknown species)  Acer rubrum  Fraxinus americana  Acer platanoides
Figure 4: Most popular tree per neighborhood.

 Gleditsia triacanthos
 Acer (unknown species)
 Acer rubrum
 Fraxinus americana
 Acer platanoides

We can get an even more granular look by plotting each individual tree:

Figure 5: Distributions of each neighborhood’s most popular tree. Marker size is proportional to tree diameter.   Gleditsia triacanthos  Acer (unknown species)  Acer rubrum  Fraxinus americana  Acer platanoides
Figure 5: Distributions of each neighborhood’s most popular tree. Marker size is proportional to tree diameter.

 Gleditsia triacanthos
 Acer (unknown species)
 Acer rubrum
 Fraxinus americana
 Acer platanoides

Here we see some interesting patterns. The unidentified maples so popular around MIT almost exclusively line the Charles River embankment.4 The white ashes (Fraxinus americana) in the Cambridge Highlands are concentrated in the Lusitania Woods by Fresh Pond. The other neighborhoods, though, are more chaotic, are favor generally popular choices such as honeylocust. Mapping trees by census block brings this wonderful chaos to life:

Figure 6: Most popular tree per census block.
Figure 6: Most popular tree per census block.

5 Planting dates

As mentioned above, only 17% of trees have a plant date listed, making it difficult to find trends. Of course, it is still interesting to examine the data we have—fig. 7 shows recorded plantings for the genera in fig. 1 starting in 2007 since there are only a handful of data points before then.

Figure 7: Planting rate per year of 20 most popular genera since 2007, for those recorded.
Figure 7: Planting rate per year of 20 most popular genera since 2007, for those recorded.

In particular, it is to determine when invasives were planted—of the 2,282 invasive trees in the dataset, only five Norway maples have a date listed. This is unsurprising—the “invasiveness” of invasives is precisely their ability to aggressively reproduce. However, even of these, we can see in fig. 8 that two were planted in 2013 and 2014, respectively, well after the 2006 law prohibiting such species took effect.

Figure 8: Planting dates of Acer platanoides, for those recorded.   Law banning invasive species in effect
Figure 8: Planting dates of Acer platanoides, for those recorded.

 Law banning invasive species in effect

There are some interesting trends visible in fig. 7—a recent fad for magnolias, a steady increase in serviceberry plantings (Amelanchier spp.), and spikes in pear tree plantings (nearly all of which are Callery pears, Pyrus calleryana) in 2009 and 2017—although given the paucity of data, it is impossible to say if they are real.

We can get another sense of trends by looking at location, rather than genus.

Figure 9: Planting dates of trees by location since 2007, for those with a date listed. Marker size is proportional to tree diameter.  2007  2020
Figure 9: Planting dates of trees by location since 2007, for those with a date listed. Marker size is proportional to tree diameter.

2007

2020

We can see several strings of orange and yellow where a whole row of trees was planted all at once. In addition, unsurprisingly, the largest trees tend to be the oldest, shown in purple.

There is much more I would like to do with this data—and much more, and cleaner, data I’d like to have—but I’ve learned a lot about Cambridge’s trees, especially about the prevalence of invasive species on my local streets.

Appendix A: Common names of trees

Table 1: Common names of the tree genera found in Cambridge. Note that, in general, there is not a one-to-one correspondence between common and scientific names, which is why the latter are generally preferred.
Genus Common name Family (APG IV5) Population
Abies fir Pinaceae 46
Acer maple Sapindaceae 4,875
Aesculus horse chestnut Sapindaceae 84
Ailanthus tree of heaven Simaroubaceae 67
Amelanchier serviceberry Rosaceae 499
Betula birch Betulaceae 374
Carpinus hornbeam Betulaceae 205
Carya hickory Juglandaceae 7
Castanea chestnut Fagaceae 4
Catalpa catalpa Bignoniaceae 27
Cedrus cedar Pinaceae 2
Celtis hackberry Cannabaceae 155
Cercidiphyllum katsura Cercidiphyllaceae 79
Cercis redbud Fabaceae 169
Chionanthus fringetre Oleaceae 5
Cladrastis yellowwood Fabaceae 122
Cornus dogwood Cornaceae 504
Corylus hazel Betulaceae 3
Cotinus smoketree Anacardiaceae 16
Crataegus hawthorn Rosaceae 82
Enkianthus enkianthus Ericaceae 1
Eucommia Chinese rubber tree Eucommiaceae 51
Fagus beech Fagaceae 130
Fraxinus ash Oleaceae 1,427
Ginkgo ginkgo Ginkgoaceae 467
Gleditsia honeylocust Fabaceae 2,637
Gymnocladus coffeetree Fabaceae 171
Halesia silverbell Styracaceae 9
Hamamelis witch hazel Hamamelidaceae 46
Ilex holly Aquifoliaceae 33
Juglans walnut Juglandaceae 19
Juniperus juniper Cupressaceae 104
Koelreuteria golden rain tree Sapindaceae 106
Laburnum golden rain tree Fabaceae 1
Larix larch Pinaceae 17
Liquidambar sweetgum Altingiaceae 377
Liriodendron tuliptree Magnoliaceae 177
Maackia maackia Fabaceae 26
Magnolia magnolia Magnoliaceae 244
Malus apple Rosaceae 762
Metasequoia dawn redwood Cupressaceae 49
Morus mulberry Moraceae 33
Nyssa tupelo Nyssaceae 25
Ostrya hophornbeam Betulaceae 40
Oxydendrum sourwood Ericaceae 15
Parrotia ironwood Hamamelidaceae 8
Phellodendron cork tree Rutaceae 59
Picea spruce Pinaceae 105
Pinus pine Pinaceae 964
Platanus sycamore Platanaceae 1,244
Populus poplar/aspen Salicaceae 192
Prunus cherry/plum Rosaceae 1,092
Pseudotsuga Douglas fir Pinaceae 11
Ptelea hoptree Rutaceae 1
Pyrus pear Rosaceae 1,072
Quercus oak Fagaceae 3,015
Rhamnus buckthorn Rhamnaceae 4
Robinia black locust Fabaceae 59
Salix willow Salicaceae 38
Sassafras sassafras Lauraceae 1
Sciadopitys umbrella pine Sciadopityaceae 1
Sorbus rowan Rosaceae 1
Stewartia tea tree Theaceae 19
Styphnolobium pagoda tree Fabaceae 357
Styrax storax Styracaceae 1
Syringa lilac Oleaceae 422
Taxus yew Taxaceae 26
Thuja arborvitae Cupressaceae 142
Tilia linden Malvaceae 1,544
Tsuga hemlock Pinaceae 101
Ulmus elm Ulmaceae 1,204
Viburnum viburnum Adoxaceae 20
Zelkova zelkova Ulmaceae 627

  1. They’ve also compiled it into quite a nice browsable map.

    OpenTrees.org maintains a collection of such datasets (possible fodder for future projects) as well as a browsable map of all of them.

  2. Like so (source)

  3. An unrelated tree whose toxic fruit (left) looks very similar to that of a true chestnut, C. sativa (right) (sources: 1, 2).

  4. Although unidentified, we can presume based on the placement that these are of the same species.

  5. The Angiosperm Phylogeny Group, “An Update of the Angiosperm Phylogeny Group Classification for the Orders and Families of Flowering Plants: APG IV,” Botanical Journal of the Linnean Society 181, no. 1 (May 2016): 1–20, doi:10.1111/boj.12385.