Checklist
This checklist helps ensure data is ready for analyses at SQUARE.
To tidy your data, maybe you find the presentation on data representation helpful. Note that it is focused on R but its main ideas can easily be used in Excel as well.
Ethical
- Data is offered through SQUARE sharepoint
- Identifiers are removed (and stored elsewhere)
Tables
- Each table is in its own file or tab, with nothing else, in *.txt, *.csv, *.sav, *.xlsx
- Table names, when multiple exist, reflect the research unit (observation, patient, ...)
Codebook / Data property table
- Metadata: includes all relevant information not intuitively clear from the data
- Per variable with working name:
- name: comprehensible name
- scale: numeric - ordered - nominal - ...
- role: dependent - independent - manifest with link to latent - ...
- comment: additional information for interpretation
Columns
~ properties of research units
- Each column relates to a particular property (e.g., treatment)
- A column contains data of one type only (except for missing values)
- Each column has a unique header (column name)
- Column headers are consistent, also when representing a pattern
- Column headers are short, simple (no special symbols) and at least minimally meaningful
RowsÂ
~ individual research units
- Each row relates to a particular research unit (e.g., patient)
- A row is uniquely identifiable (either by ID, or anything else)
Cells
~ property of a particular research unit
- Each cell relates a property to a research unit
- One and only one piece of information is included
- A decimal indicator is used consistently
- Missing values are consistently registered
Data example
- research unit is car type, one per row
- properties in the columns
Metadata example
- for each variable the relevant specifications
- includes the original variable name

