Tips & tricks
How to...

How to handle research data

  • Legal issues

    Before you collect, process and publish scientific research data, you should check the legal framework and guidelines for handling research data. Personal data, for example, is regulated by data protection laws. If you process data collected from other persons or institutions, check in advance to ensure that their use is permitted. Is the Data released under a specific license? Also consult your employer to find out what exploitation rights you have to the data you have collected. Further information can be found in our FAQs on legal aspects of handling research data and in this expert assessment of legal aspects of research data management by the DataJus project (both only in German).

    Regulations

    • German Copyright Law
      Many raw data are not protected by copyright, but there may be certain exceptions.

    Related Links
    Information on forschungsdaten.info (mainly in German)
    Legal aspects of Open Science (in German)

  • Documentation

    Metadata is used for the structured documentation and description of research data. They consist of descriptive and technical information. Research data should always be stored together with its metadata. In public repositories, this usually happens automatically and is mandatory. Before publication, metadata can be saved in a number of formats, e.g.

    • in a database.
    • in tables (e.g. Excel).
    • in a readme file (text, PDF).
    • in a structured XML file.
    • in the data file (e.g. in the file header).


    Good documentation ensures that data

    • is findable,
    • can be processed automatically, e.g. by search engines,
    • can be reused better or at all,
    • can be quoted and thus directly associated with the creator of the data,
    • is more valuable for science, since content, quality and state of processing can be evaluated.

    Basic descriptive metadata

    • Unique identifier
    • Title
    • Creator (primarily responsible researchers)
    • Collection date (also versions)
    • Format (if necessary: required software)
    • Subject area
    • Data description / abstract
    • Data collection (spatial / temporal)
    • Organisation
    • Legal Rights / License
    • Relations to other objects (data, texts...)

    Related Links

    Information on forschungsdaten.info (mainly in German)
    General metadata schemas:

  • File and folder names

    Folder and file names should consist of elements that allow quick classification of content. For example, you can provide information about the creation date, the file version and the person editing the research data. These elements are arranged in a uniform format. Make sure that naming conventions are agreed on in advance, set out in writing and adhered to during the research process.

    The more information file names contain, the longer they can become. Some programs cannot process very long file names. Information that is the same for all files in a folder is stored in the folder name instead.

    Tips for file naming

    • Dates in YYMMDD format, for example, 150828 for August 28, 2015.
    • Shorten personal data, e.g. to initials.
    • Use only the following characters for file names: A-Z a-z 0-9 _ (underscore)
    • Do not use umlauts, spaces or special characters, as many programs interpret these characters differently or do not display them correctly.
  • Data security

    Think carefully about where and how you store and secure your data. If you work with data that is worthy of protection, you should restrict access to it to the immediate collaborators. Typically, these restrictions are governed by the read and write permissions and sharing privileges on institute servers or in file services such as Seafile. Free cloud storage services and unencrypted USB media are not an appropriate place for sensitive data. Personal research data must always be stored encrypted. You can encrypt entire file systems from mass storage such as hard disks or portable USB media so that unauthorized persons can not access them. Most operating systems like macOS, Windows and Linux already come with built-in software (FileFault, Bitlocker, dm-crypt). For Windows, the open source  VeraCrypt is also recommended. Alternatively, you can also encrypt individual folders and files directly (file encryption). This is possible with the archive manager 7-Zip, some file managers or tools like GPG or OpenSSL.

    Further information

  • Data formats

    Data management begins with its collection. A variety of methods, e.g. measurements, simulations, surveys or text dissections is used to generate or collect data. Thus it is available in different file formats such as tables, CAD data, image and raster data, transcripts, program codes and much more. The chosen methodology, the type of data and its file formats determine whether and how data can be processed automatically. It also influences how compatible data is with other hardware and software systems and whether it remains readable in the long term.

    The data type determines in which form data are presented and stored. For example, survey data will rather be processed in tabular form than as a text. Complex data collection should be stored in a database rather than in an Excel sheet.

    Considering the file format of your data is important. Some devices and many application programs store data in a proprietary format. These may not be readable with other software. So, better save or convert the data in an open format. This facilitates data exchange and long-term archiving of data.

    Recommendations on data formats can be found on the website of the RADAR repository.

  • Data publication

    Funders, universities and scientific organisations require or recommend that research data and other results are openly accessible after a project ends. This makes it easier to evaluate research findings and to enable other researchers to reuse data.

    However, it is neither possible nor useful to publish all data generated during the research process. Data worthy to be published may be all data that are needed to understand a project’s outcome. The most important criteria are as followed:

    • Uniqueness: No duplicates of the data have already been published elsewhere.
    • Extremely limited reproducibility: The data cannot be re-generated or only at great expense.
    • High professional relevance: The data is of particular interest to your professional community or even across disciplines.
    • Basis of text publications: You have published a book or article based on the analysis of this data.

     

    To ensure that your data is reusable, please note the following:

    • Adequate documentation: Provide sufficient descriptive metadata so that the data set can be searched in a database (e.g. a repository).
    • Readability: If possible, save the data in open, widely used formats that can be opened independently of platforms and does not require special (possibly not permanently available) hardware and software.
    • Rights: Check whether the rights of third parties may prevent publication (e.g. copyrights or personal rights). If this is the case, try to have all necessary rights granted in writing by the persons concerned. Provide your data with an open license (e.g. CC0) so that it can be used by anyone without restrictions.

     

     

  • Data repositories

    Archiving and publishing data in a special data repository is a way to make data accessible and citable in the long term. Most repositories have special requirements for the data to be hosted, which should at best be considered before the data is created. Usually these are some or all of the following requirements:

    • Use open data formats that facilitate long-term archiving and access.
    • Mandatory metadata for documentation to increase findability and usability.
    • The assurance of the data provider that archiving and access to the data does not violate copyright or data protection laws.
    • Use of licenses or agreements that facilitate subsequent use (e.g. Open Access, Open Access after an embargo period).

     

    What to consider when choosing a repository

    • Guaranteed data preservation for at least 10 years
    • Affordable fees for long-term data preservation
    • Metadata acquisition for each record at least complying with DataCite or Dublin Core standards
    • Unique, long-term persistent identifiers, e.g. a DOI, are assigned for each data set.

    Repositories

    re3data.org
    Index of interdisciplinary and subject-specific repositories

    RIsources
    DFG Portal for Research Infrastructures

    Leibniz Universität Hannover


    RADAR
    Generic Data Repository operated by FIZ Karlsruhe and TIB

    ZENODO
    Generic Repository, funded by the European Union and operated at CERN.

  • Licenses

    Before sharing data with other parties, the requirements for re-use should be clarified. Researchers at LUH are recommended to use open licenses for data publications. By assigning an open license, the author allows other persons the right to use, modify and redistribute the data without restriction. There are licenses that limit these rights. However, these are no longer regarded as "open". The granting of a standardised license is usually a preliminary requirement for publication in repositories.

    Licenses

     

     

  • Data management plan

    The handling of research data should be recorded in a data management plan (DMP).

    The content of a DMP:

    • Project overview
    • What kind of data is used in my project? (Self-generated data, pre-existent data)
    • How is the data managed? (filenames, storage location (internal / external), backups)
    • How is the data processed?
    • Which legal aspects must be considered? (Data protection, licenses, how do I distribute my data?)
    • Data sharing and publication
    • Who is involved in what with the data (roles and responsibilities)?
    • What resources are available to me? (money, material, human resources)


    In general, you should draft the data management plan in as much detail as possible for internal project use. If the research process deviates from the original planning or if certain aspects are to be specified, the data management plan must be adapted.

    Online tools for the development of DMP

    DMPonline
    free, online tool for creating data management plans in English from the Digital Curation Centre (DCC)

    RDMO
    free DMP tool for institutional use with own entities

    Further Information

    LUH template for creating data management plans (in German)

    forschungsdaten.info (mainly in German)

    How to Develop a Data Management and Sharing Plan, Sarah Jones (DCC)

    Checklist for Data Management Plans from the DCC

  • Data storage and backup

    The loss of your data, which involves considerable expense of money, time and effort, and the analysis that builds on it, can have a significant negative impact on your research. Anyone who digitally generates and evaluates research data must therefore ensure that nothing is lost and that the results are stored securely for a long time. The following principles should be observed:

    • Regularly backup relevant research data on suitably devices, or use professional backup services.
    • The backup intervals determine the possible loss rate in the event of a fault - the more frequently you save, the lower the possible loss of data.
    • Every backup is only as good as the data recovery: Test recovery on your computer before an emergency occurs.

    Backup & Restore at the LUIS

    If possible, use the server of your institute for data storage, which is regularly backed up by the LUIS. With the Backup & Restore service, institutes and central institutions create backup copies of server data that change regularly or that belong to current projects. The service automatically stores copies LUIS servers, where they are secured for a limited period of time.

    Alternatively, the "Sync & Share" service Seafile is available as part of the central file service, which automatically copies selected data to a LUIS server. In addition, this data can be distributed to other devices.

    Backup programs

    The data on your PC can be secured on external media (USB storage, DVD, tapes) with the built-in mechanisms of your operating system (Windows Vista or higher: Backup & Restore) or with special software. A list of these programs can be found on Wikipedia.

    Related links