Research Data Management
From the planning and application of a research project, through the project duration to publication, a structured approach to research data saves time on a long-term basis, and increases the subsequent usability of research results.
Introduction and Definition
Research data are the basis of any research activity. The term research data can be defined in different ways and is therefore meant to be understood in a broad sense here.
Research data are [...] data generated in the course of a scientific project, e. g. through source research, experiments, measurements, surveys or interviews. (DFG 2009)
It is important to note that the DFG definition does not just apply to digital data and to keep in mind that, in practice, the traditional distinction between primary and secondary data often becomes blurred. Nevertheless, it is essential to distinguish research data from data about research, e. g. information about research projects.
Depending on the discipline, there are different workflows or guidelines for handling in-house generated data and for using external data. On an abstract level, however, data usually goes through the same general data lifecycle during the research process.
From the planning of a research project to short-term storage and exchange of data with project partners during the project’s duration to the publication and subsequent use by other researchers, each phase places different demands on the handling of research data. This division of the data lifecycle relies on the Research Data Lifecycle developed by the UK Data Service, among other concepts.
Why should research data management be addressed? The planned and structured handling of research data offers benefits:
- It saves time by avoiding redundant data collection and reanalysis or using new methods.
- It secures non-replicable data.
- It complies with the rules of good scientific practice and promotes transparency and validity of the data.
- It helps to meet the requirements of funding agencies (EU, DFG, BMBF).
- It promotes scientific exchange and interdisciplinary collaboration.
- It increases visibility through the publication of research data.
The following information provides orientation for services offered by the TH Köln and third-party organizations and points out existing guidelines and initiatives on the subject of research data management.
Planning and Structuring
At the beginning of any research project, it is helpful to think about certain aspects of the handling of the resulting research data.
For beginners, we recommend the information platform forschungsdaten.info. It provides a comprehensive introduction to the topic of research data management. It includes numerous helpful tips and web links that facilitate accessing more in-depth information.
When applying for research projects, it is vital to obtain information about the exact requirements of the research funding bodies. These are defined in general guidelines as well as in the individual announcements of funding programs. The most extensive requirements for research data management exist for Horizon 2020 projects unless applicants choose the so-called opt-out option.
When a specific research project is planned, creating a data management plan (DMP) can be helpful and is often prescriped. A data management plan summarizes the essential rules for handling research data in the project, e. g.:
- How is the project defined? What are the objectives, and who is responsible?
- How is the data collected, or what is the data source?
- How will data be documented and stored? What metadata and formats will be chosen?
- Where will the data be stored?
- To whom will the data be made available, and under what conditions?
- Are there legal or ethical aspects to consider?
- How will the data be archived after the end of the project, and for how long?
- What are the costs for RDM?
Detailed information on which points to consider in a DMP is provided, for example, by the Checklist for a Data Management Plan of the Digital Curation Centre or the Checklist for the Creation of a Data Management Plan in Empirical Educational Research of the Association for Research Data Education.
Saving and Sharing
Different requirements arise for storing research data in a research project. The amount of data or data protection plays a role along with the question of which persons should have access to the data. Some offers are listed below:
If you need advice on setting up personal storage space, please contact the IT admin of your respective faculty (available in-house). If a research project requires shared storage space for people from different organizational units, Campus IT offers group directories. These are suitable for use by a defined group of people/persons of TH Köln over an extended period of time.
In addition, the non-commercial cloud platform Sciebo offers a free alternative to the frequently used commercial cloud providers for shared file storage. When using Sciebo, its regulations must be adhered to, especially those regarding data protection. Special project boxes in Sciebo can be requested for projects with large storage requirements.
A list of software solutions and further services of the Campus IT for employees of TH Köln can be found in the intranet. Offers for students are listed here as well. Please direct specific requests of additional software directly to Campus IT (firstname.lastname@example.org).
Publishing and Archiving
When a scientific article is published, only selected research data is usually made accessible. However, an increasing number of journals expect authors to provide the associated original datasets, either for the peer review process or for subsequent publication.
If the journals themselves do not offer the possibility to publish the datasets, special data repositories can help. In a data repository, structured data is stored and provided with a persistent identifier (e. g. DOI). This makes them permanently available and quotable. There are subject-specific as well as subject-independent repositories. Further information on data repositories can be found on the platform forschungsdaten.info, and research options for the appropriate repository can be found here.
When publishing or passing on research data to third parties, it is important to consider relevant legal aspects and to define the rules for subsequent use by selecting the appropriate license. The platform forschungslizenzen.de provides an introduction to this topic as well.
The guidelines for ensuring good scientific practice of the TH Köln also specify that the original data set as the basis for publications has to be stored on durable and secured media for ten years. This aspect should be taken into account by project managers, or those responsible for the project, especially when an employee leaves.
Workshops and Continuing Education on RDM
In nine modules, this certified course teaches, among other things, the research cycle in various disciplines, open science, RDM consulting, technical infrastructure, and data management, including legal aspects. The certified course takes about 10 months and is designed as a blended learning course with an alternation of live online sessions (80 hours) plus a subsequent self-study phases (approximately 125 hours). The total time required is 240 hours. There is the option of completing a work project equaling the workload of approximately 35 hours in order to obtain a certificate over 8 ECTS.
This online course provides an introduction to RDM and sharing research data. It takes about 12 hours, and it can be awarded with a certificate. The course decription states that: “...learners will understand the diversity of data and their management needs across the research data lifecycle, be able to identify the components of good data management plans and be familiar with best practices for working with data including the organization, documentation, storage, and security of data. Learners will also understand the impetus and importance of archiving and sharing data as well as how to assess the trustworthiness of repositories.”
The e-learning offer “OpenGeoEdu” online, co-developed by the Leibniz Institute for Ecological Spatial Development (IÖR), is aimed at students of spatial studies such as geography, spatial, urban or environmental planning, geodesy or agriculture, and forestry. Employees in science, planning, and administration can also refresh and deepen their knowledge of the handling of open geodata with this freely accessible online offer. More about the OpenGeoEdu project.
The FOSTER portal is an English-language e-learning platform for anyone who wants to know more about Open Science. It offers training courses on various topics, including “Managing and Sharing Research Data”, “Open Licensing” or “Data Protection and Ethics”.
MANTRA is an English-language self-study program with a total of nine, up to one-hour-long, modules (from Research Data Explained to Data Protection, Rights & Access). A target group-specific approach from student to expert is offered.
On this website, you will find a German-language self-study offer on the subject of RDM with individual modules ranging from basic knowledge to case studies.