Text written in Basque and translated automatically by Elia without any subsequent editing. SEE ORIGINAL

Biodiversity Knowledge Maps

Together with the digitalization of samples of natural history accumulated over the centuries, the technological revolution has brought big data opportunities in the field of biodiversity. To know the distribution of the species is available in a succession of clicks of the most abundant and detailed information in history. However, the amount of information itself can compromise the credibility of biodiversity maps.

From natural history ? i-natural

Figure . Biodiversity platforms feed on multiple sources and make a shocking volume of information available to the user. However, the information must be critically analyzed to eliminate possible errors. This process can result in a significant loss of information, as effective analysis information could be much smaller than original information. Shapefile of the map of Euskal Herria taken from https://www.euskalgeo.eus/.

Edward O. Wilson proposed the term biofilia to claim the congenital vocation of the human being towards living beings. Once food and protection were guaranteed, man began to investigate and classify living beings from a useless perspective [1]. Since then we have been working for centuries to complete and understand the puzzle of biodiversity.

XVII. and XVIII. The 15th century witnessed multiple explorations of nature. In those times also began exhibitions of biological collections. Examples of this are the Museum of Natural History of Paris, founded in 1635, or the Kunstgela, founded about 300 years ago by Petri I of Russia, in which animal collections from all over the world were exhibited [2] (including rare animal forms).

In addition to the abundant discoveries of new species, the XIX left us pioneering works in understanding the distribution of life forms (biogeography) and origin (evolution). centuries. Humboldt, Wallace and Darwin themselves relied on biodiversity observation to materialize their ideas; trying to answer questions about biodiversity, turning biodiversity into a kiss.

XX. In the 20th century, the cataloguing of species and the documentation of their distribution continued. By 2000, museums and herbariums had about 3 billion specimens [3].

Although the legacy of the centuries is impressive, the technological revolution of recent decades has left this barrier small. The development of data repositories and digital tools (apps, GPS, Smartphones) have allowed a massive collection of information, while achieving the “democratization” of biodiversity cataloging. At present, most species distribution data are collected by volunteers to a different extent than the previous one. For example, more than 28,000 people from 170 countries participated in the Global Big Day, with 1.6 million records of 6,899 bird species (2/3 of the known) [4].

Biodiversity maps at a click

Figure . The taxonomic trend of information is evident in the Basque Country: the number of bird and fish records is much higher than that of plants, although the latter group is much more fun. Source of information: gbif.org (data collected on 20 January 2020; DOI: 10.15468/dl.lidovq 0001371-200127171203522).

Along with the development of new forms of data collection, in recent years there have been important paradigm shifts in data ownership. Millions of data have been opened that were collected in museums and herbariums, without limits and free of charge for the user. All this has facilitated the development of information exchange platforms [5]. Global Biodiversity Information Facility (GBIF) is the world's best-known platform that has put unified and standardized records of species from more than 1 meeting for online consultation.

The usefulness of information creates new opportunities in areas such as economic (e.g. through wildlife tourism), educational, ethnobotanical and, of course, biodiversity conservation. Indeed, biodiversity maps are critical to solving multiple ecological and conservation issues, including: How are species distributed? What role do present and past environmental conditions and geographical and ecological factors play in this distribution pattern? What are the hotspots of biodiversity? How has the distribution of a certain species changed (for example, an invasive, threatened species)? How will it change in the future? Of course, all these responses can be of paramount importance in addressing the consequences of anthropogenic changes [6].

Although we have so far apologized for the information available, the biggest challenge is to guarantee the quality of the information itself. Any input of information, whether collected on the mountain or from an old source of information (including gray literature), must be validated before giving the digitized approval. Most supplying information platforms follow strict detection and cleaning protocols for possible errors (e.g. taxonomic, associated with georeferencing) and there are numerous support tools for post-processing information (for taxonomic homogenization, for cleaning duplicate and misreferential data, etc. ). However, it is usually not a slow job to get the information to have a minimum level of quality and, sometimes, these screening operations can mean the exclusion of much information. Therefore, in addition to quantity, quality will limit the size of effective information (Figure 1).

Big Data achaques, biodiversity trends

Since most specimens and species location records have been received without planning, the distribution of information is not homogeneous between taxonomic groups, neither in space nor time [7]. In addition, the combination of diverse sources of information can increase heterogeneity, to the point of compromising the credibility of biodiversity maps.

Traditionally, some taxonomic groups have gained more attention than others (Figure 2). Although both experiences can be fascinating, people tend more to observe birds than to make snail collections. Likewise, to the extent that rare, emblematic, threatened species, etc. are treasures most appreciated for nature lovers, are documented more intensely, as reflected in the number of biodiversity platform records.

Also, some geographic areas are better shown than others. Except for exceptions, the hottest information points are natural spaces. Prized species often act as bait, as species attract nature lovers to their places of residence [8]. In other cases, logistical reasons focus on the proximity to the home of the main data recipient, the existence of a public access area, accessibility or the existence of a long-term monitoring zone [9].

Thus, both taxonomic and spatial trends can be variable over time. This allows to know well the past distribution of a species, but not the current one (also the other way around). In the most serious case, information may become obsolete. Therefore, it is advisable to use only registers of a given period of time that are realistically adjusted to the research objective, although it may be a significant loss of information.

Figure . From a floristic point of view, the Ordesa y Monte Perdido National Park is the most deeply sampled area of the Iberian Peninsula. However, the space sampling effort is not homogeneous. This spatial trend makes the map of plant diversity based on raw data largely reflect the number of records. From the knowledge map we can visualize the credibility of the diversity map and identify the points to be sampled in depth. Source of information: Jaca herbarium adapted from http://projectos.ipe.csic.es/floragon/index.php [12].

The reasons for taxonomic, spatial and temporal trends can be very diverse, depending on the case and the scale. For reasons, it is essential to identify these trends and assess their impact in order to carry out credible biodiversity analyses. And in this sense, analyzing ignorance can be as important as analyzing one's own knowledge.

Credibility of biodiversity maps

Heterogeneous sampling in biodiversity by space is reflected in databases: some spatial units have more records than others. On a global biodiversity map, the effect of this sampling effort will be lower because biodiversity gradients are more marked. However, on a medium or small scale, the distortion of the sampling effort will be much more serious: the hottest points of biodiversity will correspond to the deepest points, while the colds will correspond to the poorly sampled ones (Figure 3). Behind this result are the accumulation curves of species, which take the form of the saturation curves of enzymes: in the first samples corresponding to a geographical unit many new species will be collected; in the following, less and less, until no more species are found in the unit (saturation).

Knowing at what point is the process of species accumulation, the degree of knowledge (or degree of ignorance) of the species list can be estimated. Thus (or other non-parametric methods, see [10]) it is possible to measure whether each geographic unit is well or poorly displayed and, incidentally, determine the credibility of the biodiversity map [11]. From these measurements you can create a knowledge map that helps us judge whether the information is useful to answer the question that concerns us. In some cases, biodiversity analyses may be limited to “well” units shown [12], while in other cases the need for more data will be evident.

To complete the information gaps, it has been proposed to use species distribution models. However, the models are not at all perfect and, in most cases, it does not seem the best alternative to incorporate more uncertainty into a map with a high degree of uncertainty [8]. So what? Digitizing new data (only 1% of the data from herbaria and museums are georeferenced [13] or it is only necessary to go and search for data to the mountain, a crude reality. However, there is good news, knowledge maps can be an effective tool for planning and optimizing new sampling. In fact, the promotion of sampling in little-known places allows maximizing the contributions of new registers to develop reliable biodiversity maps [14].

In the age of big data as important as ensuring access to information is to encourage its critical use. Knowledge measurement can be the starting point for assessing the credibility of biodiversity maps, as well as for designing sampling to complete them. However, non-specialized users (managers, professionals, natural science researchers, etc.) should perform serious programming and modeling exercises to produce such maps. It is therefore time to break that bottleneck and extend the use of knowledge maps to different areas. Tools to visualize the estimation of the sampling level and its spatial distribution (e.g. interactive web applications).

Knowing how many pieces and where we are missing to complete the biodiversity puzzle can be a solid starting point for advancing knowledge, as well as a sincere exercise of rejection of overly ambitious goals.

Bibliography

[1] Anderson, J.G.T., 2017 Why Ecology Needs Natural History. Am. Sci. URL: https://www.americanscientist.org/article/why-ecology-needs-natural-history.

[2] Pyke, G.H., Ehrlich, P.R., 2010 Biological collections and ecological/environmental research: a review, some observations and a look to the future. Biological Reviews. 85, 247–266.

[3] Shock, L., Humphrey, P.S., 2000. Can Natural History Museums Capture the Future? BioScience 50, 611–617.

[4] eBird, T., 2018. Global Big Day 2018: a birding world record - eBird. URL: https://ebird.org/ebird/news/global-big-day-2018-a-birding-world-record.

[5] Edwards, J.L., Recent searches Nielsen, S.A., 2000. Interoperability of Biodiversity Databases: Biodiversity Information on Every Desktop. Science 289, 2312–2314.

[6] Graham, C.H., Ferrier, S., Huettman, F., Moritz, C., Peterson, A.T., 2004 New developments in museum-based informatics and applications in biodiversity analysis. Trends in Ecology & Evolution 19, 497–503.

[7] Rocchini, D., Hortal, J., Lengyel, S., Wolf, J.M. Jiménez-Valverde, A., Ricotta, C., Bacaro, G., Chiarucci, A., 2011. Accounting for uncertainty when mapping species distributions: the need for maps of ignorance. Progress in Physical Geography 35, 211–226.

[8] Nekola, J.C., Hutchins, B.T., Schofield, A., Najev, B., Perez, C.E., 2019. Caveat consumptor Museum: Let the museum data user beware. Global Ecology & Biogeography 28, 1722–1734.

[9] Dennis, R.L.H., Sparks, T.H., Hardy, P.B., 1999 Bias in butterfly distribution maps: the effects of sampling effort. Journal of IndependConservation 3, 33–42.

[10] Sousa-Baena, M.S., Garcia, L.C. Peterson, A.T., 2014 Completen of digital accessible knowledge of the plants of Brazil and priorities for survey and inventory. Diversity and Distribution 20, 369–381.

[11] Pardo, I., Roquet, R., Lavergne, S., Olés, J.M. Gomez, D., García, M.B. Spatial Congruence between Taxonomic, Phylogenetic and Functional Hotspots: True Pattern or Methodological Artefact? Diversity and Distributions 23, 209–20.

[12] Pardo, I., Paw, M.P., Gomez, D., García, M.B. 2013 A Novel Method to Handle the Effect of Uneven Sampling Effort in Biodiversity Databases. PLoS ONE 8, e52786.

[13] Guralnick, R.P., Wieczo, J., Beaman, R., Hijmans, R.J., 2006. BioGeomancer: Automated Georeferencing to Map the World’s Biodiversity Data. PLoS Biology. 4th

[14] Robertson, M.P., Cumming, S.G., Erasmus, D.F.B., 2010 Getting the most out of atlas data. Diversity and Distribution 16, 363–375.