How to define tidy data if there is repeated measures?
Some time ago I read R for Data Sciece and there is the following definition of tidy data:
There are three interrelated rules which make a dataset tidy:
Each variable must have its own column.
Each observation must have its own row.
Each value must have its own cell.
Back then, the idea seemed quite reasonable but now the concept seems not that consistent to me. In chapter 12 it is said that table1
is tidy and this is how it looks like:
I would expect the country to be the observation unit and therefore, countries must not appear in multiple rows. I expect the following to be the tidy data table of table1
:
The problem with my solution seems to be that we have values (of the year) as part of variable names. On the other hand, the dataset table1
as it is suggested to be tidy has no obvious observation unit in my understanding. We could say that the combination of the two columns country
and year
forms the observation unit but there is nowhere a definition of such a rather complex observation unit (not in the book and not in the publication on tidy data).
Topic data time-series
Category Data Science