What is an ARC?
An ARC (Annotated Research Content) is a data repository for storing and sharing research data and metadata. The concept was developed to make plant science data machine-readable. Published ARCs with a persistent identifier can be considered data papers that can stand alone or support research papers.
- ARCs are meant to include all data and relevant metadata referring to the research represented by the ARC.
- One of the core principles of ARCs is the separation of data and metadata (metadata = “data about data”).
- ARCs follow a predefined folder and file structure for storing and sharing data. The folder structure supports different categories of data; .xlsx files are used for storing metadata.
- ARCs are located in DataHUBs, such as the TRR356 PlantMicrobe DataHUB. The TRR356 PlantMicrobe DataHUB is one of several DataHUBs created for housing ARCs. Like all other DataPLANT DataHUBs it is a GitLab instance. Thus, ARCs are essentially GitLab projects.
- Tools have been created to allow users interacting with the DataHUB without having to master git.
For whom is the the TRR356 PlantMicrobe DataHUB?
The TRR356 PlantMicrobe DataHUB was created to serve the Research Data Management needs of the TRR356 PlantMicrobe project for researchers within the project and their collaborators outside the project.
How do I sign up and log into the TRR356 PlantMicrobe DataHUB?
Find information on registration and login here. If you have problems signing in, get in touch with support-plantmicrobe@zdv.uni-tuebingen.de.
How do I interact with ARCs?
ARcs can be interacted with using a browser and the DataHUB webpage (e.g. of the TRR356 PlantMicrobe DataHUB) or by using git. In addition, tools have been developed to create and manipulate ARCs, see FAQ ARCmanager and other tools, such as ARCitect, Swate-alpha or the Swate Excel-Add-In.
We recommend to initiate ARCs and upload research data using ARCmanager. Unless ARCmanager is used, for files larger than 50 MB, make sure to use git LFS. ARCmanager uses LFS automatically for files larger than 50 MB.
Where does the ARC come from?
The ARC (Annotated Research Context) concept has been developed and is used by DataPLANT. General information on ARCs can be found in the DataPLANT Knowledge Base. The ARC structure follows the isa abstract model. An ARC is a collection of data (a crate) with a description. Behind the scenes, ARCs are transformed to RO-Crates (Research Object Crates) for machine-readability.
ARC data folder structure
The predefined ARC folders correspond to biological input materials or data (studies), measurement results (assays), computational tools or scripts (workflows) and their results (runs).
ARC metadata
ARC metadata are located in so-called isa-files, which are .xlsx files (MS Excel). All tools for making these isa-files are based on the “Swate workflow annotation tool for Excel”, in short Swate.
Isa-files are present in any project as isa.investigation.xlsx, in any ARC studies subfolder as isa.study.xlsx and any assays subfolder as isa.assay.xlsx.
Administrative metadata (i.e. who did which work at what institution and published it where) is assembled in vertical tables that are compulsory for any ARC. In practice, most of these data will be within isa.investigation.xlsx. Here is how administrative metadata appears in ARCmanager.
Experimental metadata are stored in Annotation table sheets in isa.study.xlsx and isa.assay.xlsx files. Here is the ARCmanager manual on experimental metadata.
Metadata are annotated using controlled vocabularies and ontologies (see here or in Arcmanager). To achieve machine readability, these tables need to follow pre-defined formats. Tools have been developed for the creation and maintenance of isa-tables. Templates have been developed for annotating different kinds of research.
Ontologies
The usage of controlled vocabulary increases machine readability. Ontologies supporting DataPLANT metadata annotation can be searched at the TIB Terminology Service. Ontology search options are also included in all ARC tools. Metadata templates come with fully annotated column headers.
ARCmanager
ARCmanager is a browser based tool for creating and maintaining ARCs, and applying changes to the ARC in the DataHUB. The ARCmanager manual provides step-by-step instructions.
For the TRR356 project, the ”TRR356 PlantMicrobe DataHUB for the Transregio project TRR356” is selected at the Login to ARCmanager.
Helper tools for making ARCs
For the TRR356 PlantMicrobe project we recommend using ARCmanager. For help and feature requests contact support-plantmicrobe@zdv.uni-tuebingen.de.
Tools for creating the basic ARC folder and file structure are ARCManager (browser-based) and ARCitect (runs locally).
Browser based tools for creating and modifying metadata sheets are integrated in ARCmanager or Swate. Files created with the online Swate application have to be placed and renamed by the user.
ARCitect creates isa-files locally and includes the table tool of swate. An Excel Add-in has been devised for local installation.
.xlsx isa-files can also be edited in any application like, e.g. MS Excel or Libre Office, that can process such files.
Metadata templates
Predefined templates (see in the DataPLANT Knowledge Base or in the ARCmanager manual) for annotating specific kinds of metadata are included in all Annotation table tools. In ARCmanager, the contributor of the template is noted in brackets after the template name. Some of the DataPLANT templates are geared towards data submission to international repositories such as ENA, Pride, GEO or MetaboLights. A TRR356 (ZMBP) collection of templates will be started if desired (contact support-plantmicrobe@zdv.uni-tuebingen.de).
git
Git is a system that controls versions of files. It is used by GitLab instances such as the TRR356 PlantMicrobe DataHUB. A short instruction for installing git was given in the Munich autumn 2023 workshop. Please be aware that if you wish to upload big files to the DataHUB using git or the ARCitect application, that you need to install and use git LFS. For using ARCmanager it is not necessary to install git or git LFS. Further instructions will follow.
Using git to download entire projects as git projects you need an Access Token. The option is in the left side panel after clicking your avatar. Please follow the instructions (p. 2).
If you need further help, please get in touch with support-plantmicrobe@zdv.uni-tuebingen.de.
Make ARCs international data repositories obsolete?
No. ARCs are not meant to replace well established international repositories for different kinds of data such as ENA, PRIDE, GEO etc. ARCs are intended to collect all data relating to a research undertaking in one place and improve the machine readability of the data. ARCs could also include metadata that do not find a place in established international repositories due to data format restrictions. The data stewards (support-plantmicrobe@zdv.uni-tuebingen.de) from TRR356 I01 will do their best to support you in making your ARCs great sources for information for humans and machines.
ARC publication
ARC publication involves:
(a) making a version of your ARC public in the TRR356 PlantMicrobe DataHUB. While this is not obligatory, it is strongly recommended as it makes it easier for re-users of to access individual files.
(b) obtaining a Persistent Identifier (PID), like, e.g. a DOI (Digital Object Identifier), for a packaged version of your ARC in a suitable repository. Data publication with a PID makes it possible to receive recognition for data publications.
Researchers from the University of Tübingen are recommended to publish their data via FDAT, the LMU uses Open Data LMU and TUM uses mediaTUM. Other well known repositories are Zenodo, Dryad or Figshare.
Access to teaching materials from past meetings and workshops
Z03 & I01 workshop Aug 2024 course material and Orthofinder - including also ARCs, linux and git cheat sheets, cloud computing, de.NBI access,
ARCmanager introduction March 2024
“Munich Exercise”: Downloading and installing git, creating DataHUB signin, obtaining a personal token, using git.
VERDA (Virtual Environment for Research Data and Analysis) introduction July 2023