Tips & tricks
How to...

How to handle research data

  • Legal issues

    Before you collect, process and publish scientific research data, you should check the legal framework and guidelines for handling research data. Personal data, for example, is regulated by data protection laws. If you process data collected from other persons or institutions, check in advance to ensure that their use is permitted. Also consult your employer to find out what exploitation rights you have to the data you have collected. Further information can be found in our FAQs on legal aspects of handling research data and in this expert assessment of legal aspects of research data management by the DataJus project (both only in German).

    Regulations

    • German Copyright Law
      Many raw data are not protected by copyright, but there may be certain exceptions.

    Related Links
    Information on forschungsdaten.info (mainly in German)

  • Documentation

    Metadata is used for the structured documentation and description of research data. They consist of descriptive and technical information. Research data should always be stored together with its metadata. In public repositories, this usually occurs automatically and is mandatory. Before publication, metadata can be saved in a number of formats, e.g.:

    • in a database.
    • in tables (e.g. Excel).
    • in a readme file (text, PDF).
    • in a structured XML file.
    • in the data file (e.g. in the file header).

    Good documentation ensures that data is

    • findable,
    • can be found automatically, e.g. by search engines,
    • is better or even reusable,
    • can be quoted and thus directly associated to the creator of the data,
    • is more valuable for science, since content, quality and state of processing can be evaluated

    Basic descriptive metadata

    • Title
    • Creator (primarily responsible scientists)
    • Collection date (also versions)
    • Format (if necessary, required software)
    • Subject area
    • Unique identifier
    • Data description / abstract
    • Data collection (spatial / temporal)
    • Organisation
    • Legal Rights / License
    • Terms Relationship to other objects (data, texts...)

    Related Links

    Information on forschungsdaten.info (mainly in German)

    General metadata schemas:

     

     

  • File and folder names

    Folder and file names should consist of items that allow quick classification of content. For example, you can provide information about the creation date, the file version and the person editing the research data. These elements are arranged in a uniform format. Make sure that naming conventions are agreed in advance, set out in writing and adhered to during the research process.

    The more information file names contain, the longer they can become. Some programs cannot process very long file names. Information that is the same for all files in a folder is stored in the folder name instead.

    Tips for file naming

    • Dates in YYMMDD format, for example, 150828 for August 28, 2015.
    • Shorten personal data, e.g. to initials.
    • Use only the following characters for file names: A-Z a-z 0-9: (colon) .(period) - (hyphen) _ (underscore) / (slash)
    • Do not use umlauts, spaces or special characters, as many programs interpret these characters differently or do not display them correctly.
  • Data formats

    Data management begins with its collection. A variety of methods, e.g. measurements, simulations, surveys, text analyses is used to generate or collect data. Thus it is available in different file formats such as tables, CAD data, image and raster data, transcripts, program codes and much more. The chosen methodology, the type of data and its file formats determine whether and how data can be processed automatically. It also influences how compatible data is with other hardware and software systems and whether it remains readable in the long term.

    The data type determines in which form data are presented and stored. For example, survey data will rather be processed in tabular form than as a text. Complex data collection should be stored in a database than in an Excel sheet.

    Considering the file format of your data is important. Some devices and many application programs store data in a proprietary format. These may not be readable with other software. So, better save or convert the data in an open format. This facilitates data exchange and long-term archiving of data.

    Recommendations on data formats can be found on the website of the RADAR project.

  • Data publication

    Funders, universities and scientific organisations require or recommend that research data and other results are freely accessible after a project ends. This makes it easier to evaluate research findings and to enable other researchers to reuse data.

    However, it is neither possible nor useful to publish all data generated during the research process. Data worthy to be published may be all data that are needed to understand project’s outcomes. Most important criteria are as followed:

    • Uniqueness: no duplicates of the data have already been published elsewhere
    • Extremely limited reproducibility: the data could not be regenerated or only at great expense
    • High professional relevance: The data is of particular interest to your professional community or even across disciplines
    • Basis of text publications: You have published books or articles based on the analysis of this data

     

    To ensure that your data is reusable, please note the following:

    • Adequate documentation: Provide sufficient descriptive metadata so that the data set can be searched in a database (e.g. a repository).
    • Readability: If possible, save the data in open, widely used formats that can be opened platform-independently and does not require special (possibly not permanently available) hardware and software.
    • Rights: Check whether the rights of third parties may prevent publication (e.g. copyrights or personal rights). If this is the case, try to have all necessary rights granted in writing by the persons concerned. Provide your data with an open license (e.g. CC0) so that it can be used by anyone without restrictions.

     

     

  • Data repositories

    Archiving and publishing data in a special data repository is a way to make data accessible and citable in the long term. Most repositories have special requirements for the data to be hosted, which should at best be considered before the data is created. Usually these are some or all of the following requirements:

    •     Use open data formats that facilitate long-term archiving and access
    •     Mandatory metadata for documentation to increase findability and usability
    •     The assurance of the data provider that archiving and access to the data does not violate copyright or data protection laws.
    •     Use of licenses or agreements that facilitate subsequent use (e.g. Open Access, Open Access after an embargo period)

     

    What to consider when choosing a repository

     

    •     Guaranteed data preservation for at least 10 years
    •     Affordable fees for long-term data preservation
    •     Metadata acquisition for each record at least complying with DataCite or Dublin Core standards
    •     Unique, long-term persistent identifiers, e.g. a DOI, are assigned for each data set.

     

    Repositories


    re3data.org
    Interdisciplinary search for subject-specific repositories

    RIsources
    DFG Portal for Research Infrastructures

    Leibniz Universität Hannover

    •     LUIS data archive non-public archive
    •     LUIS is currently developing a repository through which research data of university members can be published. Further information (in German) ...


    RADAR
    Generic Data Repository operated by FIZ Karlsruhe and TIB

    ZENODO
    Generic Repository, funded by the European Union and operated at Cern.

  • Licenses

    Before sharing data with other parties, the requirements for re-use should be clarified. Researchers at LUH are recommended to use open licenses for data publications. By assigning an open license, the author allows other persons the right to use, modify and redistribute the data without restriction. There are licenses that limit these rights. However, these are no longer regarded as "open". The granting of a standardised license is usually a preliminary requirement for publication in repositories.

    Licenses

     

     

     

  • Data management plan

    The handling of research data should be recorded in a data management plan (DMP).

    The content of a DMP:

    • Project overview
    • What kind of data is used in my project? (Self-generated data, pre-existent data)
    • How is the data managed? (filenames, storage location (internal / external), backups)
    • How is the data processed?
    • Which legal aspects must be considered? (Data protection, licenses, how do I distribute my data?)
    • Data sharing and publication
    • Who is involved in what with the data (roles and responsibilities)?
    • What resources are available to me? (money, material, human resources)


    In general, you should draft the data management plans in as much detail as possible for internal project use. If the research process deviates from the original planning or if certain aspects are to be specified, the data management plan is adapted.

    Online tools for the development of DMP

    DMPonline
    free, online tool for creating data management plans in English from the Digital Curation Centre (DCC)

    RDMO
    DMP tool for institutional use with own entities (under development)

    Further Information

    LUH template for creating data management plans (in German)

    forschungsdaten.info (mainly in German)

    How to Develop a Data Management and Sharing Plan, Sarah Jones (DCC)

    Checklist for Data Management Plans from the DCC