Deployed site: https://ieco-lab.github.io/iEcoDMP/
“The goal is to turn data into information, and information into insight.” — Carly Fiorina
“It is a capital mistake to theorize before one has data.” — Sherlock Holmes
“Without data, you are just another person with an opinion.” — W. Edwards Deming
“Data is a precious thing and will last longer than the systems themselves.” — Sir Tim Berners-Lee
Data management is project management, and a data management plan (DMP) is used to standardize data formatting, naming, organization and storing so that data are accessible, understandable and reproducible. Data are any information collected or created pertaining to a research project (e.g. data tables, analysis scripts, figures, manuscripts, and reference libraries).
The definition of a research project depends on the individual project. Research projects are defined by project leads in collaboration with a PI(s). An easy guideline to follow is to use the data to define what is a project and what is a subproject. For instance, if you are using data that will be incorporated into many papers, then you would not want a project for each individual paper because then you would have multiple copies of data and keeping good data management practices across all copies would be difficult. In that case each paper would be a subproject under a larger project as defined by the data.
The DMP is reviewed regularly because each time a new project is started, the DMP should be referenced. It is updated based on input from iEcoLab team members who are actively managing and creating research projects. All iEcoLab members are encouraged to read this DMP and discuss with their PI(s) if they prefer to follow a different DMP for one project.
You can find the most up-to-date DMP and a quick guide by following the links in the top right corner of this page.
Project Delineation: In this section, guidelines for how to delineate projects and subprojects are outlined
Project Roles: In this section, guidelines for important project leadership roles are outlined
Authorship: In this section, reasons for the creation of authorship guidelines and a resource to use for the creation of these guidelines is provided
Data Access and Ownership: In this section, guidelines for the accessibility and ownership of the data produced by research projects are outlined
File Organization: In this section, guidelines for how to organize your files in a directory are outlined. File organization should also be strictly adhered to.
File Naming: In this section, guidelines for how to name files are outlined. Naming conventions should be strictly adhered to.
File Documentation: In this section, guidelines for how to create readme and meta data files for data and changelogs for versioning are outlined. In addition, information on what to include in these documents are also detailed. At the very least a Meta Data file is required for all data tables.
Version Control: In this section, guidelines for how to version control your data are described. Specific guidelines for version control of analysis scripts will be described in a later section about Git Version Control.
Data Collection: In this section, guidelines for how data from the field, laboratory, and mining digital sources should be collected are described. These guidelines are not for the actual methods and techniques of generating the data but rather how to properly record data and transfer it to a data file on a computer.
Data Formatting: In this section, guidelines for the formatting of data tables are described and includes guidelines for column headers, text formatting, table structure, and file types.
Data Backup: In this section, guidelines for how to back up your files are outlined.
Data Storage and Sharing: In this section, guidelines for how to store and share your finalized data files are outlined.
In this section, guidelines for how to use an R Package to organize, document, and run your analyses. This includes creating a package for your analyses and vignettes so that collaborators and other researchers can easily understand your code and reproduce your analyses.
Package Files: In this section, the folder structure is outlined and described
Functions: In this section, the way in which functions should be written in an R package and how they can increase the repeatability of a project’s analysis is described
Vignettes: In this section, the way in which vignettes should be written in an R package and how they can be used to communicate analyses and results is described
Website: In this section, how websites are created using pkgdown
websites is explained.
Using Git: In this section, guidelines for how to how to use Git to version control your R scripts and analysis package.
Borer, E.T., Seabloom, E.W., Jones, M.B. and Schildhauer, M., 2009. Some simple guidelines for effective data management. The Bulletin of the Ecological Society of America, 90(2), pp.205-214. LINK
Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L. and Teal, T.K., 2017. Good enough practices in scientific computing. PLoS computational biology, 13(6). LINK
Ramapriyan, H. K., and P. J. T. Leonard. 2020. Data Product Development Guide (DPDG) for Data Producers version1. NASA Earth Science Data and Information System Standards Office, 9 July 2020. LINK
British Ecological Society: Guide to Reproducible Code
PI(s) – Principle Investigator(s); Matt Helmus and/or Jocelyn Behm
iEcoLab – Integrative Ecology Lab at Temple University
Research Project – Defined by the project leads in collaboration with their PI(s). Generally defined by the data collected for each project.
Project Lead – The researcher who oversees the day-to-day tasks of the project.
Data Manager – The researcher who is in charge of curating the data for the project. This person is often the same as the project lead but can also be any researcher working on the project