Documentation

The expectations for research data documentation are discipline dependent, and some disciplines are more prescriptive than others. In general, research data documentation should include enough information for users to independently understand and replicate the results. There are many options for documenting your research data, including codebooks, data management plans, lab manuals, metadata, project manuals, READMEs, etc. A single project may use multiple types of documentation for different purposes. 

One set of principles that can guide the creation of documentation is ALCOA+. ALCOA+ comes from the Food and Drug Administration (FDA) and is commonly used in clinical trials, the pharmaceutical industry, manufacturing, and other industries. ALCOA+ principles, when followed, are generally considered a sign of data integrity, where the accuracy and consistency of the data is preserved regardless of changes made. 

  • Attributable – where did the data come from? Who acquired it and when? Who performed what actions? What systems/devices/etc. were used to collect/process the data?
    • In practice, this may be as simple as including the time, date, and data collector with the data, and recording origins and edits in a README
  • Legible – the documentation is readable and understandable for all involved.
    • In practice, this may require having handwritten notes typed, but also can refer to including a key to any abbreviations, acronyms, and initialisms; an explanation of the data structure; or saving the records in a format that is unlikely to change (such as .txt or .csv)
  • Contemporaneous – documentation should occur as events occur.
    • In practice, capturing details at the time so information is not lost
  • Original – the original documentation is preserved, and changes are documented. 
    • In practice, this typically means saving the original data file separately and making a copy before any edits are made. (This is extremely helpful in case mistakes happen)
  • Accurate – the information should be error-free and precise.
    • In practice, this refers to using systems that minimize errors, and documenting any corrections
  • +Complete – the documentation should include all data.
    • In practice, this means that the original data is available, and any changes are documented and explained
  • +Consistent – the data does not include conflicting or contradictory information, and are in the expected order.
    • In practice, this means organizing records and ensuring they are consistent, such as following a time sequence and/or making sure data is in the same order across files
  • +Enduring – the data is accessible and retrievable for the time required by regulations and sponsor requirements.
    • In practice, this means not relying on temporary recording methods, like dry-erase boards, or scrap paper
  • +Available – authorized individuals are able to access and review the data.
    • In practice, this means data should be available over the course of the data life cycle, which may mean having a tracking form or document if data is stored in multiple places

ALCOA++ (c. 2023) has added the principle of Traceable, which refers to keeping detailed records about the locations and movements of data. Traceable elements include timing of movements, how and where it is stored, and who is responsible for different elements. 

There are also standard operating procedures around Good Documentation Practices, such as this one from the Frederick National Laboratory for Cancer Research. 

Other resources that may be helpful:

Codebooks 

Data management plans

Electronic Lab Notebook 

Lab manuals 

Metadata

Project manuals 

READMEs

Other