« Resources for creation sections 1 & 2 | Main | Configuring TextPad to Make it Easier to Use »

LocMaps for SSF files

When you make locMaps for SSTF files, there are four parts to the process.
You will map:
A. the variables not in nCubes at the beginning of each line
B. the line numbers at the end of each line
C. the nCubes
D. the filler spaces after the nCubes


You will need:
1. aggdata\\INCOMING\1990-SSF\whichever one you're working on\Docs\HOWTOUSE.ASC (if more than one, use all)*
2. aggdata\\INCOMING\1990-SSF\whichever one you're working on\Docs\IDEN_FTN.ASC (if more than one, use all)**
3. aggdata\\INCOMING\1990-SSF\whichever one you're working on\Docs\TBL_OUT.ASC (if more than one, use all)
4. putty.exe (download from http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html)
5. aggdata\Programs\LocMap_normal_var.pl (copy from here to the directory where your file is)
6. aggdata\Programs\LocMap_nCube_3.pl (copy from here to the directory where your file is)
7. TextPad
8. an extra file for storing the locMap parts
9. the xml file for which the locMap is being created
10. an unzipped copy of the data file in stored in aggdata\TRANSLATED\1990-SSF available for viewing.


*IDEN_FTN.ASC has the record layout (with file widths, etc) for the geographic and other non-nCube variables (denoted with an "M" for misc.).
**HOWTOUSE.ASC has the record layout for the nCubes.


I. Print or have open for viewing 1 and 2.


II. Start putty following the instructions below:
putty.gif
You'll get a black screen asking you to log in. Use your mpc username and password. Then navigate to the directory where the files and programs are with this command:
cd /pkg/aggdata/METADATA/dropbox/amy
If you are not working in dropbox, change as appropriate.


III. As needed, use these UNIX commands:
a1. ls = show me everything in my directory
a2. hitting the up arrow will enter in text you've entered so far in your session. So, if you've run a locMap program once, you don't have to type it again. Just hit the up arrow.
a3. Paste copied text by hitting the right mouse button. Inside putty, highlighting text also copies it.
a4. You can use the Tab key to fill out the rest of a directory name if that directory name is unique. Soooo..."cd /pkg/agg" + the Tab key fills out the rest of aggdata. It only takes one capital M to get metadata and one lower case d to get dropbox.


IV. Map the variables not in nCubes at the beginning of each line
Start LocMap_normal_var_v2.pl (see below).
Bold text = prompts from the LocMap_normal_var_v2.pl program; the rest are examples of what you would enter.

./LocMap_normal_var.pl
Enter file name: STF3-1-2-3-4.xml
Enter the variable ID of the first and last variable of your range (ex:U18-U20): ***
Mb>Input the number of repetitions for record identification strings that occur on EACH physical part of a longer logical record: 0 (or 1 or 16 or whatever) ****
Input the start position for first dataItem: 1 (always)
Input Variable widths:
Variable ID U18: 3 (these numbers come from IDEN_FTN.ASC)
Variable ID U19: 6
Variable ID U20: 2
Would you like to add any FILLS in this range(y/n): n (always n)

***Do not include the FILL or LINENO (LINENO= line number) variables in this group.

****The number of repetitions refers to record segments which are listed in the HOWTOUSE.ASC file. Open the file and do a find for "Segmentation of SSTF{fill in the number} Records". Count the TOTAL number of segments (whether a, b, c or whatever) and that's what you enter here. So a file with 1 A segment and 12 B segments has 13 total and you should enter 13. The image below shows the start of a segment listing (ignore the green text/arrows for now):

howtouse.gif

When you finish, an output file of the pattern "LocMap_normal_var.output" will be created in the same directory.


V. Save contents of LocMap_normal_var output to 7 and save.


VI. Map the line numbers at the end of each line
-Open the data file.
-Hit the End key.
-Look at the bottom of the Textpad window for the character number (see image).
-That's the line length.
-Scroll down to the last line of the file.
-Note the line number (2043 or 77 or whatever).
-Open 7.
-Copy the last dataitem.
-Change the ID to "DI_LINENO"
-Change varRef to "LINENO"
-Change the endPos to the line length (8174 in the image above)
-Change the width to the number of characters of the last line number (4 for 2043, 2 for 77)
-Adjust the startPos accordingly.
-Save this dataitem just above the </locMap> tag.

Example:
<dataItem ID="DI_LINENO" source="producer" varRef="LINENO">
<physLoc source="producer" recRef="REC_1" startPos="8172" width="3" endPos="8174" />
<physLoc source="producer" recRef="REC_2" startPos="8172" width="3" endPos="8174" />
<physLoc source="producer" recRef="REC_3" startPos="8172" width="3" endPos="8174" />
<physLoc source="producer" recRef="REC_4" startPos="8172" width="3" endPos="8174" />
<physLoc source="producer" recRef="REC_5" startPos="8172" width="3" endPos="8174" />
<physLoc source="producer" recRef="REC_6" startPos="8172" width="3" endPos="8174" />
<physLoc source="producer" recRef="REC_7" startPos="8172" width="3" endPos="8174" />
<physLoc source="producer" recRef="REC_8" startPos="8172" width="3" endPos="8174" />
<physLoc source="producer" recRef="REC_9" startPos="8172" width="3" endPos="8174" />
<physLoc source="producer" recRef="REC_10" startPos="8172" width="3" endPos="8174" />
<physLoc source="producer" recRef="REC_11" startPos="8172" width="3" endPos="8174" />
<physLoc source="producer" recRef="REC_12" startPos="8172" width="3" endPos="8174" />
<physLoc source="producer" recRef="REC_13" startPos="8172" width="3" endPos="8174" />
<physLoc source="producer" recRef="REC_14" startPos="8172" width="3" endPos="8174" />
<physLoc source="producer" recRef="REC_15" startPos="8172" width="3" endPos="8174" />
</dataItem>


VI. Map the nCubes
Bold text = prompts from the LocMap_nCube_3.pl program; the rest are examples of what you would enter.
The program goes like this:

./LocMap_nCube_3.pl
Enter file name:
Enter the ID of the first and last nCube of your range (ex: NPB010-NPB015)
Enter recRef:
Enter Start position:
Enter width:

The program ends and an output file is created.

Now, SSTF files are broken into segments. These segments often start and/or end in the middle of an nCube. The program above can't capture this kind of break - it can only work with a whole table at a time. This means you have to construct the locMap segment by segment using the HOWTOUSE.ASC file.

Because the geography variables should all end at position 300, then each record segment (which is equal to 1 or more lines in the data file) should start at 301. In other words, multiple data items reference the same space. This is ok because each data item references a different record group, so each item is still a unique combination of record group and position.

If each segment broke at the end of each table, then you would run the LocMap_nCube_3.pl program once for each record segment. Instead, as you can see below, segments begin and/or end in the middle of tables.

howtouse.gif

Or, to put it this way:

recgroup.gif

LocMap_nCube_3.pl doesn't account for these kinds of breaks. Therefore, not only do you have to run the LocMap_nCube_3.pl program once for each record segment, but you have to adjust your starting position for each segment so that the whatever data item is at position 301 is the one you are supposed to start with.

For example, based on the image above:

The input for record A (REC_1) would be:
range: NPA001-NHA003
record ID: REC_1
start position: 301
width: 9

The input for record B segment 1 (REC_2) would be:
range: NPB001-NPB007
record ID: REC_2
start position: 301
width: 9

NOTE: ALL dataItems starting with the 32nd data item in table NPB007 will be deleted as they are NOT in this segment.

The input for record B segment 1 (REC_3) would be:
range: NPB007-NPB018
record ID: REC_3
start position: 22
width: 9

NOTE: All dataItems BEFORE NPB007 cell 32 will be deleted (cell 32 should have a start position of 301). All dataItems starting with the 73rd dataItem in table NPB018 will be deleted as they are NOT in this segment.

The input for Record B segment 3 would be:
range: NPB018-NPB022
record ID: REC_4
start position: 301 - (72*9) = -347
width: 9

NOTE: Same type of editing as before. Note that dataItem 73 in NPB18 starts in position 301. Because there are more than 300 characters in 72 cells (648) the start location you enter has to be a negative number to get the first cell of this table located in this segment to start at 301. All dataItems starting with the 134th dataItem in table NPB022 will be deleted as they are NOT in this segment.

You repeat this process until you reach the end of the record segments.


VII. Map the filler.
On each line, after the nCube data, but before the line numbers will be some blank spaces. We account for these using the FILL variable.
-After creating each locMap segment in VI, go to the last dataitem.
-Copy this:

<dataItem ID="DI_FILL#" source="producer" varRef="FILL">
<physLoc source="producer" recRef="REC_#" startPos="#" width="#" endPos="#" />
</dataItem>

-Each fill is sequential starting with 1.
-The REC_# should match the one it's filling out.
-The startPos = (endPos of preceding dataitem + 1)
-The endPos = (startPos of the lineno dataitem - 1)
-The width is the difference


VIII. Map "FILLER Tables" as filler variables.
In some cases, a record segment ends in the middle of a "FILLER Table". That means that the next segment begins with a FILLER Table and you need to calculate the correct start and end positions for the FILL variable that replaces the table and the first nCube in that segment.

Say you have two segments like this:

Segment 9

Geographic Identification PB40--17 data cells--through
Information PB44--259 data cells

8,165 characters including
5 characters filler

Segment 10

Geographic Identification PB44--227 data cells--through
Information PB45--388 data cells

8,165 characters including
2 characters filler

Check in TBL_OUT.ASC to see if either of the tables at the beginning or end is a FILLER Table. It will look like this:

PB47. FILLER

In those cases, before re-running LocMap_nCube_3.pl, insert another blank FILL variable.

-The REC_# should match the one it's filling out.
-The startPos = 301
-The endPos = 301+(the number of data cells for that table x 9) (in the example above, it would be 301+(227x9) = 2344
-The width is the difference

When you run LocMap_nCube_3.pl now, your nCube range is just PB45 and the startPos is 2344+1.
Then clip off the extra dataitems as in V.

IX.
When you have finished, copy the contents of your holding file (e.g. the whole locMap) into your original xml file immediately above <dataDscr>. Save and email me that you've finished.