« Literature reviews | Main | Longitudinal data (change scores) »

Relational database dimensions (Excel)

Because Excel allows you to enter anything in any cell, it can contain three or more dimensions of information 'smashed' into two dimensions. This can make for pretty tables, if you are going to do a statistical analysis remember that each row and each column should contain only one type of information.

EXAMPLE: I received an Excel spreadsheet of study data. The first column is variable names, the second is the first participant's day 1 data. the third is participant's day 2 data, the fourth is day 3 data. So far so good, because row 1 is always the ID, row 3 is whether juice was consumed that day, etc. Column five is the sum of the 3 days of data, so it does not match the other three.

ID#_    1   1   1   1   2   2   2   2
day_    1   2   3   -   1   2   3    -
juice   y   n   y     y   y   y   3
milk     n   n   n   0   y   y   y   3

Because it was a small data set I simply deleted the summary columns. If I had been asked earlier I would have suggested putting running totals in additional rows below the day's variables rather than adding columns with different meanings.