The KDE Smoothing Parameter: Approaching the Core Issue

When calculating individual space use by the kernel density estimation (KDE), the smoothing parameter h must be specified. The choice of method to calculate h has a dramatic effect on the resulting estimate. Here I argue that looking for the optimal algorithm for h is probably a blind alley for other reasons than generally acknowledged.

Two methods that are used extensively for KDE home-range analysis; the least square cross validation method (LSCV) and the method to determining the optimal h for a standard multivariate normal distribution (Href). In short, both methods have been found to have serious drawbacks. In particular, LSCV is generally under-smoothing the home range representation, leading to a utilization distribution (UD) that tends to be fragmented with many local “peaks”. On the other hand, Href tends to over-smooth the UD. Thus, relative to LSCV the resulting UD suppresses the local peaks in density of fixes and tends to show a larger home range for a given isopleth. The literature on these issues, including proposals for alternative methods, is huge. Since most ecologists working on animal space use are aware of this methodological minefield I limit myself to refer to Horne and Garton (2006).

This cormorant Phalacrocorax carbo regularly revisited a given bay and a given part of its shoreline, offering a good opportunity for a patient photographer. The bird’s fishing success at this particular location was subtantial, which illustrates nicely how spatial memory – and in particular the concept of subjective habitat autofacilitation (Gautestad and Mysterud, 2010) – plays an important role in vertebrates’ space use activities. However, such self-reinforcing revisit of patches undermines the statistics-theoretical foundation for KDE as a descriptor of habitat selection. Photo: AOG.

The core problem with the KDE approach is in my view not how to optimize between over- and undersmooting of the UD. All KDE variants are based on a common assumption that the actual animal has utilized its habitat i a Markov compliant manner. This is more serious than the h issue. I refer to my book and a series of previous blog posts for explanation of a Markov process, with numerous examples provided. In short, Markov compliance is in the present context a mathematical form of a process that is statistically “compatible” with the statistical kernel functions, which represent the backbone of any KDE. Such a process in the context of a home range implies a temporally scale-specific (“mechanistic”) habitat ultilization. In the limit of large samples of relocations of the animal such a mechanistic process leads to convergence towards a “smooth” UD in statistical terms. That is, even a multi-modal UD may be assumed to be locally “flat” upon zooming sufficiently into the UD’s functional surface.

So far, absolutely all theoretical developments within the KDE arena rest on this “smooth UD surface” statistical-mechanical assumption.

I thus conclude that the KDE is not an appropriate approach for data that are collected from an animal that has utilized its habitat in a complex manner. In another post "What About Intra-Home Range Fix Density?" (Search Archive) I provided support for this view, using data from free-ranging sheep. In other words, if an animal has utilized spatial memory, it generates a home range as an emergent property from the movement process. Further, if the animal has integrated its spatial and historic information to allow for multi-scaled space use the UD will no longer be smooth (as shown by the sheep data, and – for example – data on red deer; Gautestad et al., 2013). The UD will describe a statistical fractal; i.e., mathematically rugged on all resolutions, in a self-similar manner.

The KDE will thus never be able to describe a multi-scaled home range pattern realistically, from the perspective of local intensity of habitat use. Other approaches are needed. For example, as an alternative to KDE’s isopleths I generally advocate using incidence, I; studying number and spatial distribution of non-empty grid cells from superimposing a virtual grid onto the spatial scatter of fixes. In other posts I have in this regard described a method to find the optimal grid resolution, leading to a formula that can be applied to estimate the animal’s characteristic scale of space use (CSSU) under the given conditions.

Applying CSSU analysis will reveal that two sections of a home range may show similar density of fixes but different local magnitude of CSSU. Relatively small CSSU for a given density implies a higher degree of intra-section “clumping” of fixes (thus, 1/CSSU is expressing intensity of habitat utilization, independent of fix density per se).

Similarly, two sections may show strong difference in density but a similar magnitude of CSSU and thus a similar intensity of habitat utilization despite the density difference. For example, 1/CSSU may be large (CSSU small) within specific sections of the periphery of the home range. In this case, despite low fix density the animal has shown more “surgical” space use inside this section during its visits. I refer to previous posts for more details.

REFERENCES

Gautestad, A. O. and I. Mysterud (2010). “Spatial memory, habitat auto-facilitation and the emergence of fractal home range patterns.” Ecological Modelling 221: 2741-2750.

Gautestad, A. O., L. E. Loe, and A. Mysterud. 2013. Inferring spatial memory and spatiotemporal scaling from GPS data: comparing red deer Cervus elaphus movements with simulation models. Journal of Animal Ecology 82:572-586.

Horne, J. S. and E. O. Garton. 2006. Likelihood Cross-Validation Versus Least Squares CrossValidation for Choosing the Smoothing Parameter in Kernel Home-Range Analysis. J. Wildl. Manage. 70:641-648.