July 13, 2006

Writing hierarchical datasets in SAS

Really, I don't expect anyone among my regular readers [of my irregular writing] to be interested in this. But do say in comments if you are ...

I had cause today to have to write a hierarchical dataset. Now, reading a hierarchical dataset into a statistical program is so routine that the internet is rife with examples (SAS, Stata). Perhaps my Google skills are rusty, but not so much help from the internets with writing a hierarchical dataset. (Does anyone else have trouble typing hierarchical? I feel like I'm spelling it wrong at least half the time).

Anyway, I digress. The example code below assumes that you have a variable that uniquely identifies the unit that contains other observations. Or, more specifically you might have a household serial number or a family identification number.

SORT your data. The first step is to sort the data by the household or family serial number. If you have some identifier within the family or household (a person number, unique within the household or family, for example) it helps to sort by that too.

Then it's straightforward

DATA _NULL_ ; /* No need to write a SAS dataset at the same time */
SET <file you are reading from> ;
BY <household or family identification number> ;

FILE "<full path and name of the file you want to write>" ;

IF first.<household or family identification number> THEN DO ;
PUT <first household variable> <format> <begin-column> - <end-column>
<second household variable> <format> <begin-column> - <end-column>
<last household variable> <format> <begin-column> - <end-column> ;

PUT <first person variable> <format> <begin-column> - <end-column>
<second person variable> <format> <begin-column> - <end-column>
<last person variable> <format> <begin-column> - <end-column> ;

Some notes and caveats.

  • If your data are formatted. For example, if you have associated "Minnesota" with the number "27", then by default SAS will put "Minnesota" instead of "27". This is easy to solve with a "FORMAT _all_ ;" statement after the BY statement. This will clear all formats.
  • You can obviously include IF and WHERE statements after the SET statement to write out a subset of the data.
  • By default, character values will be left aligned. The magic internet will help you find out how to align character values where you want them.
  • Numbers will not be zero padded after the FORMAT _ALL_ statement, so you may wish to format all numbers with the appropriate Zw.d format you require.

Good luck. Comments are not expected, but welcome.

Posted by eroberts at July 13, 2006 5:59 PM