Tips & tricks
Glossar

Glossar

Explaining important terms

This glossar is based on the glossar of the RADAR project and on articles of the wiki forschungsdaten.org.


  • Bitstream Preservation

    Digital data consist of a fix sequence of bits (bit stream) with each bit representing either the value 1 or 0. Bit stream preservation means, that the sequence stays exactly the same. Due to ageing processes, storage media tend to corrupt individual bits over time. In order to avoid this, it is necessary to replace the medium on a regular basis. Copying data to a new medium is also necessary when technological advances lead to the widespread use of a new kind of media. Bit stream preservation is a basic requirement for long-term archiving of digital data.

  • Certification

    In the context of research data, this term usually refers to repositories. Repositories can receive a certificate, if they comply with certain standards. This ensures both the quality and trustworthiness of the repositories.

  • Citation Guidelines

    Citation of scientific data publications varies widely depending on the subject area and research discipline. The topic of citation of research data is currently being dealt with by various scientific groups, so that a uniform standard does not (yet) exist.

  • Copyright

    Copyright

    Some kinds of research data, such as drawings and photographs, may constitute a “work” and hence be subject to copyright laws (German). This is true if they reach the necessary level of creativity and originality. (Measurement) data exclusively generated by machines usually do not qualify. If data is protected by copyright then all rights to use, exploit and reproduce them lies with those persons who created them, as long as no legal agreements reached beforehand indicate otherwise (e.g. work contract, cooperation agreement or a contract on a commissioned project). The originators may, however, cede these rights in order to enable others to re-use their works.

    In case of publically funded research many funders expects all data to be made accessible by everyone free of charge as long as there are no legal constraints. Wherever possible, re-use should be allowed without restrictions. To this end, it is recommendable to allocate a license to (possibly) copyright-protected research data. The Creative Commons licenses CC0 (no rights reserved) or CC-By (originators must be indicated), for example, are especially well-known and proven.

  • Creative Commons Licences

    In some cases the creators of research data may gain a copyright (e.g., on photographs or drawings). Re-using such data would require the authors´ explicit permission, often causing unintended complications and uncertainties when the data is to be shared with third parties. It is therefore recommendable to endow the data with a licence which clearly states the conditions for re-use. The Creative Commons (CC) licences are especially well-known and established. CC licences exist in several varieties, each allowing or prohibiting certain kinds of usage. Research data is generally best licences under CC0 (public domain), meaning that the authors waive any copyrights they may hold. When re-using the data in a scientific context, it is nevertheless mandatory to name the authors in order to comply with the rules of good scientific practice.

  • Data Archive

    A data archive is a facility for the long-term storage of digital data in their original state (bitstream preservation). This includes backup copies and a regular replacement of storage media. If additional services such as a migration to newer file formats or the online publication are available, the facility is not a mere archive but a repository.

  • Data backup

    Data backup is the temporary duplication of data to avoid data loss due to technical faults. The data backup usually includes the synchronization of the currently used work environment. For the best possible security, at least two copies should exist in different locations. Data on servers of the computer center of Leibniz Universität Hannover is automatically backed up redundantely.

  • Data Journal

    Data journals publish articles (so-called data papers) which document the processes of data generation, including applied instruments and methods. These descriptions enable the best possible reusability of the data. In some cases, the journals provide an in-house repository where the described data themselves may be deposited. But it is also possible to keep the data in a different place. Generally, this place should be referenced in the data paper using a persistent identifier such as a digital object identifier (doi).

  • Data management plan

    A data management plan (DMP) is an instrument to plan and structure research projects. The DMP documents the type and amount of data to gather requirements regarding the data handling. Ideally, a DMP is created before the projects starts. During the project course the DMP evolves, since it is a "living document".  

  • Data Publication

    In order to publish data, they must be stored in a suitable repository and be publicly accessible there via the Internet. Many repositories also offer the possibility of restricting access to certain groups of persons (e.g. scientists only) and making them available on request and only with the express consent of the author of the data. In order to make published data citable over the long term, they should be available via a permanent link. This is ensured by assigning a persistent identifier, e.g. a DOI (Digital Object Identifier).

  • Data protection

    Data protection comrises technical and organisational measures to prevent loss, unauthorised access to and abuse of personal data. Personal data are all data that directly or indirectly enable the identification of an individual (e.g. name, adress, IP adress or E-Mail adress). In general collecting personal data ist only allowed if the concerned persons explicitly agree, though there are restrictions and exceptions (e.g. for certain authorities and use cases).

    Regarding research, personal data especially accrue in medical studies and in the social sciences. In these cases, encyption and data keeping in particularly well protected places is absolutely mandatory. By retrospectively pseudonymising or anonymising the data, however, relations to specific individuals may be erased to a degree that even the publication of the data becomes legally possible.

    Since 25 May 2018 the European General Data Protection Regulation is in force as a directly applicable law. The German Federal Data Protection Law and the Data Protection Law of Lower Saxony (German only) have been adapted accordingly. For further information please refer to the website of the Data Protection Officer of Leibniz University.

     

  • Digital Object Identifier (DOI)

    A Digital Object Identifier (DOI) is a persistent identifier used for citing and linking objects. It uniquely and permanently identifies the object in digital environments. A DOI consists of a two-part structure - a prefix and suffix (e.g. DOI: 10.1000/123456). Further information on DOI registration of research data can be found at the DOI Service of the German National Library of Science and Technology.

  • Embargo

    A (temporal) embargo defines a period in which only the description (metadata) of the research data is publicly accessible, but not the associated data. An embargo can be applied if research data (e.g. as part of a peer review process) are to be published with a time delay.

  • Enhanced Publication

    The term 'Enhanced Publication' describes the electronic publication of a research article, which is linked to the corresponding publicly accessible digital research data.

  • File Format

    The file format is key to a long-term readability of digital data. File formats differ in level of prevalence and documentation. Some are “open”, meaning that their exact specifications are public. Others are proprietary and hence producer-dependent. In these cases, specifications are often not public. The rarer a format and the less known its exact specifications the higher is the probability that already in a few years from now no up-to-date software will be available which is able to open and read the files. If you are going to archive data for a long term try to convert the files into open und widely used standard formats. The RADAR project provides a list of suitable formats.

  • Good Scientific Practice

    The rules of Good Scientific Practice serve as orientation within the framework of scientific work processes. In Germany, these rules can be found, for example, in the recommendations of the Deutsche Forschungsgemeinschaft (DFG) on ensuring good scientific practice. Recommendation 7 states that "primary data should be kept for ten years as a basis for publication on durable and secure media in the institution where they originated". This should ensure that research results are verifiable. Publication of the data also promotes the re-usability of the research data.

  • Harvesting (Metadata)

    Harvesting describes the systematic and automated collection and processing of metadata from databases, repositories and other digital sources. The visibility, discoverability and re-usability of published research data can thus be increased.

  • Legal Protection for Databases

    The legal protection for databases includes an ancillary copyright which guaranties the investors who financed the creation of a database the commercial exploitation rights for 15 years. The data base protection does not protect the content of the database (which may be subject to general copyrights) but its compilation. It only applies if a “major investment” in terms of money, time, labour, etc. was necessary to reach the “threshold of originality”. The ancillary copyright is based on the directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases.

  • Long-term archiving

    The long-term archiving of research data is a procedure that keeps data available and interpretable for an indefinite period of time, beyond technological and socio-cultural changes. In contrast to data backup, data must be kept decodable and readable at all times during long-term archiving. For example, data must be migrated to other file formats.

  • Metadata

    Metadata is often referred to as'data about data' and is used to categorize and characterize the various information about digital objects: Technical metadata includes information on data volume and data format, for example, and is of central importance for sustainable data storage. Descriptive metadata (also called descriptive or content metadata) provides information on the information contained in digital objects (e.g. scientific) and thus decides on their discoverability, referencing and reusability.

  • Persistent Identifier

    A persistent identifier is a permanent identifier of a digital resource. It identifies the digital resource permanently and clearly and is characterized by a strict separation of resource and location reference of the resource. A Digital Object Identifier (DOI) is a so-called persistent identifier.

  • Primary Data

    Primary data are often used synonymously with Research Data (see Research Data). In some cases, however, the term refers only to raw data and not necessarily to processed research data.

  • Repository

    A repository is a document server that stores and maintains digital scientific objects (e.g. journal publications or research data) for an unlimited period of time. It also offers the possibility of making them available, citable and reusable online for the worldwide (specialist) community. There are institutional repositories which members of the respective institution can use for publication. Disciplinary repositories offer publication opportunities for researchers of one discipline.

  • Research data

    Research data are all data that arise in the course of scientific work. They form the basis of current and potential future scientific findings.

  • Research data management

    Research data management includes the organization and management of digital data gathered and generated during a research process. This includes all activities for planning research activities, generation, documentation, analysis, storage, archiving and publication of digital data.