Jornada LTER Data Management Plan

The Jornada Information Management System (JIMS) provides the infrastructure for the curation, protection, access, and analysis of Jornada LTER (JRN) data holdings. Our mission is to protect and provide access to publically funded research data, tools, and findings that result from JRN research and associated collaborations. The purpose of our information management system is to provide protocols and services for data collection, verification, organization, archive, and distribution. One of the primary tools to insuring long-term usefulness of data and products is detailed metadata that describes the research project and its related datasets. Metadata are shared and leveraged amongst services and tools of the JIMS. We provide access to hundreds of data sets linked directly to our program or from other research locations and management agencies in support of multi-user needs. Our intent is to provide data sets, science-based information, tools, and technologies that can be used, through simple or more complex analyses, to address the needs of a diverse user community. JIMS is a multi-organization system that contains data and metadata holdings and ancillary information from the Jornada through the LTER, USDA, and AISE as well as collaborative efforts, either through data collection and storage for other sites (e.g., BLM-Malpai Borderlands) or through development of tools to improve data access and analysis (e.g., EcoTrends: https://ecotrends.info; The Nature Conservancy Landscape Toolbox: https://www.landscapetoolbox.org/).

A. Information management system. Our system consists of six major components: (a) data management implementation/process, (b) management of data, spatial maps, and imagery, and the creation of and access to associated metadata, (c) formal data management protocols, (d) tools and resources dedicated to harvest, document, archive, manage, and make data accessible, and tools to access, analyze, and download the data and metadata, (e) networking and computing services, and (f) support staff.

(a) Data management implementation. Our site manager, John Anderson, acts as the liaison between researchers and the information management team. His involvement begins during the Project Design phase with the completion of the Jornada Notification of Research form by a researcher prior to the start of work. This form alerts the Information Manager (IM) to the new study and potential LTER data sets. Upon initiation of a new study, the researcher completes a Project Documentation form that provides the second level of "metadata" documentation, and arranges for GPS of the data collection sites by the LTER field crew. Research related forms can be found at https://jornada.nmsu.edu/lter/data/documentation.

In the data collection phase, the IM helps researchers to design field and laboratory data sheets that facilitate data entry and analysis. The investigator completes a Data Set Documentation form to provide the metadata that fully describe the data set. Both Project and Data Set Documentation forms are provided with the data set when it is requested or obtained from our website. JIMS data entry programs validate data upon entry. Computer files are subjected to further verification by graphing and/or error-checking programs, and/or examination by the responsible investigator. Final quality assurance rests with the investigator who submits data for inclusion in the Data Management System. Direct communication between researchers and the IM ensures the timely submission and accessibility of data, as required by NSF guidelines.

One of the biggest challenges is migrating historic and current data into formats consistent with database rules, and to support geospatial analysis and mapping. Processing data files that have been collected or designed without database protocols is an enormous workload. Our approach that focuses on continued interactions between researchers and the IM minimizes this workload (Fig. C1).

Fig. C1.

 

(b) Management of data and metadata. We employ many tools in this system, including SQL Server, GIS Server, geodatabases, Drupal, data entry systems, and open-source geoportals. We are building a system that optimizes the contribution of each tool to the total system. We also optimize the roles of people on the IM team as the system evolves. We are building, and have pilot tested, an approach to information management that integrates tabular research data with the spatial component of data sample location as a complete dataset package. This approach allows us to easily integrate data from more than one research program or project, and facilitates our research.

(c) Data protocols and metadata standards. Procedures are conducted in accordance with recommendations and guidelines developed by the LTER Network-Level Information Managers Committee (IMC). Data access, acknowledgement, and data management policies can be found at https://jornada.nmsu.edu/lter/data/policies. JRN data policies are in accordance with those developed by the LTER IMC (https://lternet.edu/data/netpolicy.html). Compliant EML is being produced for each dataset before being harvested into the LTER Data Portal.

(d) Tools and resources dedicated to harvest, document, archive, manage, and make data accessible. Data derived from LTER funding are made freely and publicly available within two years after collection (see Suppl. Table A1). Data are routinely updated online, typically within one day after received by the IM. Data are designated as either ““Unrestricted”” and available online or ““Restricted”” with release authority by the responsible investigator. Restricted datasets are those in preparation for publication or part of student research that is protected to allow them the opportunity to publish.

Our website offers options for users to access, query, and understand the datasets online before deciding to download. We will continue to add data accessibility and analytical tool capabilities as they are made available from our collaborative projects (EcoTrends, Landscape Toolbox). Datasets are delivered to users as downloadable dataset packages from the data catalogs and geoportals through web queries. The dataset package includes metadata files, data files (with coordinates for each data record), and a shapefile (spatial representation of dataset location). We plan to have all long-term JRN LTER datasets in this system by the end of May 2012 (see Suppl. Table A1). These datasets are currently available online at https://jornada.nmsu.edu/data-catalogs/jornada, and a listing is available at https://jornada.nmsu.edu/data-catalogs/long-term. The data table, as well as the spatial location where the data were collected, are treated as integrated objects available in more than one format, such as comma separated value (CSV) text files and a shapefile. The CSV files with x,y coordinates can be easily added to any spreadsheet, database, GIS, or analytical software. The shapefile format can be used with most GIS systems.

Geodatabase -- The Jornada geodatabase runs ESRI ArcSDE spatial data engine on SQL Server 2008. The geodatabase provides storage and access to JRN spatial and tabular research data holdings. The geodatabase is also used to create and manage metadata in FGDC format which is subsequently used by the JRN website and geoportal to allow users to visualize, search, and access JRN GIS and tabular data. Map and image services are created from geodatabase resources, and are provided by the GIS server to the Geoportal and other web-mapping applications. This approach provides a visual display of the cataloged spatial datasets to the user. A geodatabase is used to integrate spatial and tabular research. Geographic coordinates, as well as other key dataset identifier fields, are inserted into CSV files to allow the data to be easily imported into any number of analytical software or databases.

Website, Data Catalogs, and Geoportals -- The JRN website provides access via a data catalog and geoportal to data as well as personnel information, publications, research proposals, reports, and other information about the Jornada and its research activities and collaborations. The website follows the LTER website design recommendations. Recently, the Jornada moved to a Drupal content management system to host websites for all JRN-related research projects and collaborations. The original JRN website is being moved into this combined website with implementation by the Drupal Environmental Information Management System (DEIMS) to support the data catalogs. DEIMS was initially developed by the LTER Network Office (LNO), and has been adopted by several other LTER sites (ARC, LUQ, NTL, NWT, PIE, SEV, VCR) as a common approach to making data available and for generating EML to LTER best practices, which will be harvested into the LTER Network Information System (NIS). EML generated by DEIMS is being harvested into the current LTER Data Portal (network-level metadata search engine).

We implemented ESRI open source geoportals into JIMS. The geoportals provide textual searches via keywords as well as the ability to query geographic extent and to map research site locations. Although primarily developed to facilitate the distribution of spatial datasets, the geoportal can also be used to query and deliver a wide range of products, including documents, tabular data, and integration with other data portals using web services. Registered users can save multiple search terms to revisit the site at a later date. Data providers can manually publish datasets in the portal or the geoportal can be configured to automatically publish datasets when properly formatted FGDC metadata are added or updated in a specific internal directory. The interface will also allow a user to select a bounding extent to limit or clip spatial datasets and automatically e-mail the customized files to the user as a zip file. The geoportal has the capability to deny access to restricted datasets or grant access to only selected registered users as defined by the portal administrator.

We plan on integrating Drupal and the geoportals to allow seamless access to both systems without requiring users to login separately to them. As GIS enabled data packages are created, the EML files are updated using Drupal to point to the data package.

(e) Networking and computing services. The Jornada site offices and laboratories located in Wooton Hall on the campus of NMSU are connected to a local area network (LAN) through a firewall to the NMSU network (Gigabit Ethernet). Most computers and all servers are connected to the LAN using Gigabit Ethernet (1000 Mb). We plan to increase bandwidth from the field station to the NMSU campus from 1.54 MB to 50-75 MB as soon as possible using high speed, multi-hop, point-to-point wireless radios. The increased bandwidth will support streaming data and video, and remote education activities (K-12) from the wireless network covering the research site. We plan to continue increasing the wireless coverage (cloud) across the research site to provide Wi-Fi and 900 MHz spread spectrum connectivity for researchers, educators, and scientific instrumentation.

Jornada servers support two resource pools: development and production. Each resource pool supports multiple virtual servers running multiple operating systems (Linux, Windows). The resource pools are configured to provide high availability and workload balancing to ensure the servers are available 24 hours a day, 365 days a year. If one of the physical servers (hypervisors) within a pool fails or is brought down for maintenance, the virtual servers running on the server are automatically transferred to another hypervisor. To a user connected to services provided by one of the virtual servers, the server will appear to have a slight delay (15-30 seconds), but otherwise the user will see no apparent effect from the virtual server being transferred to another hypervisor. Workload balancing allows virtual servers to be redistributed to other hypervisors in the resource pool to ensure optimal performance in the event a hypervisor starts to slow down due to workload. Currently, we have four physical servers within the production resource pool and two in the development pool. Additionally, servers that are not virtualized provide directory services (Active Directory, LDAP), backup, and workload balancing storage for the resource pools. Server storage is centralized using a storage area network (SAN) and provides 93 TB of storage capacity. The servers and SAN are connected redundantly to allow for hardware failure without impacting server performance.

Multiple forms of backup are incorporated to protect data and systems from disaster and to allow for rapid recovery in case a disaster occurs. Servers and switch closets are physically secured and environmentally controlled to provide security and protection. Differential backups are performed nightly on all servers and many desktop computers using a dual drive LTO 4 tape library directly attached to the SAN. Backup media is reused after three months. Backup media are stored off-site in case of catastrophe. Virtual server snapshots are performed prior to system upgrades or modifications to allow rapid recovery in the event these alterations produce undesirable results. The data archive volume is backed up routinely to DVDs and hard drives stored off-site. The DVDs are not reused, but are saved indefinitely. We are exploring mechanisms to automate and schedule server snapshots with little or no additional cost. We are also exploring disk-to-disk backups and alternative technologies to replace our tape library.

(f) Personnel. Our IM team consists of four full-time staff jointly supported by the JRN and USDA (Ken Ramsey: Information Manager; Jim Lenz: Network and Systems Administrator; Valerie LaPlante: Multimedia and Website Administrator; Scott Schrader: Geoportal Administrator). Student employees and graduate assistants support data entry and computer programming efforts. Team member’’s skill sets complement each other with some overlap to allow for temporary absence and employee turnover.

B. Milestones and deliverables relative to LTER network activities. Ongoing JRN participation in LTER network-wide activities includes the LTER Data Portal, All-Site Bibliography, and Climate databases as well as representation and participation at the annual Information Managers (IM) Meeting, IM Executive Committee, and NIS workshops. These activities are associated with expanding the capability of the JRN to acquire, maintain, and exchange information in a timely fashion to meet our milestones and deliverables (Fig. C2), and to share this information with other LTER and non-LTER users via the JRN website and the NIS being developed at the LNO.

Fig. C2. JRN milestones and deliverables. NOTE: The term ‘‘NIS Ready’’ indicates that the EML metadata are complete, accurately describe the related data file, and follow LTER best practices currently being defined by the LTER IMC as the NIS is developed by the LNO.

Members of the JRN IM team are active participants in the NIS development. Ken Ramsey is participating in three NIS development tiger teams. Ken Ramsey and John Anderson are participating in three cross-site IM working groups (WG) to advance efforts to prepare site data and associated metadata for inclusion in the NIS: the SensorNIS WG is developing best practices for preparing near real-time streaming sensor data; the DEIMS WG is developing a common approach for creating EML; and the GeoNIS WG is developing best practices for inclusion of GIS and remote sensing data.

We continue to develop the EcoTrends website by adding datasets from > 50 sites within the US and abroad, deriving new data variables, and improving data accessibility and analytical tools. This web site was migrated from the LNO to a Jornada virtual server in LTER-V. The content was updated during this process to correct problems or omissions that had not been previously identified. We continue to develop the next iteration of the EcoTrends website and have dedicated four full-time staff and several students to this project. We plan on implementing the LTER NIS at the Jornada as soon as possible to explore integration of the new EcoTrends website with the data and metadata web services of the NIS. We are also integrating the P2ERLS website of general information (e.g., ecosystem type, long-term mean precipitation, temperature) from > 300 sites distributed globally (https://www.p2erls.net) with the EcoTrends website (https://ecotrends.info).

Recently, activity and discussion within the LTER Network resulted from the Bob Robbins video illustrating problems he encountered while trying to access data from each web site. JRN responded quickly to these problems by immediately implementing a website redirect to forward users to the current data catalog page. We then created EML using DEIMS to replace JRN EML documents used to search for our datasets on the LTER Data Portal. These documents now point directly to the appropriate dataset section of the data catalog. During this process, we increased the quantity of datasets in the LTER Data Portal as well as the quality and congruency of the EML metadata. As a member of the NIS Data Portal tiger team (Ramsey), we will continue to work with the LNO and the NIS developers to ensure that the current LTER Data Portal and planned LTER NIS Data Portal allow users to more easily access JRN datasets and associated metadata.