« March 2004 | Main | May 2004 »

April 27, 2004

MPC 4.28.04

Working on revision of metadata coders' manual.

It needs to be
-up to date - I'll need to find out what's no longer true of NHGIS practices
-made into html - easier to create than word files, easily located later, better suited to the kind of use it will get (section by section, rather than one long read)
-have repetitive content removed (b/c originally in 7 diff. word files; overlap ensued)
-illustrations / examples added - going to stick to a single example all the way

April 22, 2004

MPC 4.22.04

Ok, in addition to continuing work on the CBP descriptions myself, I'll now also begin revising the metadata coders' manual in anticipation of new hires in June and prepping a very short "we are here" for the MTAG meeting 4/30 and 5/1.

Today, for a change of pace, I'll start looking at the manual...

April 20, 2004

MPC 4.20.04

DDI notes: generic elements for creating glossaries do not go in otherMat (section 5); they go in section 2...this is not clearly indicated in the DDI Tag Library. Sooooo....I moved that part into section 2 for CBP...

-after seeing the nCubes realize that we need to include the SIC code number in the label...

To Do:
-adjust SICs as described above
-modify nCubes as in minicbp.xml
-add measurement elements to all the nCubes


It seems like my allergies are worse this year than last, but of course, I can't remember last year in any detail (at least not w/respect to allergies). Therefore, I'm going to try writing down that this year kinda sucks so I'll at least be able to compare *next year*...
Nights interrupted by allergies since mid-March: 10, maybe up to 15.
Days using eyedrops and saline solution: 5ish
Days having to use decongestants all day and night: at least 20

April 19, 2004

MPC 4.19.04

Useful Clarifications:

NHGIS specs:
-everything must be in metadata provided it's used in an nCube; otherwise, it needn't be included

-catValu has to be what you actually find in a file, so for the suppression flags, the catValu has to be a letter or a blank

-use CoordValRef to reference big standard codes like SICs in locMap; gets around having to have 80 zillion re-iterations of the coordNos & coordVals when such codes are included in nCubes

-basic variables will be "total mid-march employment", etc.. Basic nCube will be SIC x "total mid-march employment", etc. Class size nCubes will be SIC x "total mid-march employment" x Class size - would be rep. on the page as

Establishments with 1-4 Employees
--SIC | Total Mid-March Employment | Total Annual Payroll | etc.

-change geo in US and state to combine us and st geo; pull census code forward, don't worry about locMap (handled by tools; although will need to be tweaked a bit)

NHGIS procedures:
-coders make dataDscr first, which is run through tools to make sure that the display shell will be generated correctly in the access software
-coders needn't stick to organization of data in original files - e.g. in CBP state and county files, one/two geo fields are at the end of the data. Since the data file will likely be restructured, var and nCube creation can be independent of physical placement variables.
-proofer does additional cleanup as needed.
NHGIS staff and/or proofer handle section 3, which is used to fill in values into the display shell.
-fileTxt has to be written - typically manually; mostly NHGIS staff, as the original data structures are often dumped b/c untenable. Thus, fileTxt can't be written until restructured form is known/decided on.
-the locMap is done in two parts: non-nCubes and nCubes - presumably after fileTxt has been done so that record length, file widths, etc. are known.

Still unsure about:
-there are base versions of sect 1 & 2; these are customized as needed and are done last (by proofer or coders?)
-glossary is created by coders? Isn't included in metadata; not covered in NHGIS Metadata Coders Manual...

April 15, 2004

MPC 4.15.04

Yesterday I said:

-There was a suggestion that perhaps the US and state files could be treated as two recGrps of a single file, but that won't work because they each have to have their own locMaps. locMap is non-repeatable and at the same level as fileTxt (which contains the recgrp element). Therefore, each file type needs its own complete .

Ok, that was wrong - forgot that you could have one locmap, one set of dataItems and then multiple physLoc elements differentiated by references to the recgrp they belong to.

However, I still don't want to try to combine them for the following reasons:

1. fileTxt really looks for a particular file. Since the CBP files are (at least since 1986) consistent in releasing separate files for each geo level (and each state for the county files), I'd be mis-using the element in order to to assign the US and state files as record groups w/in a single fileTxt element.

2. It would be more complicated than three separate codebooks for each type of file.

3. It's not clear to me why, if you put US and states together, you couldn't also add in counties.

4. I don't see what we'd gain by using the combination approach. These metadata files are small and if the access system is currently doing single year searches, then what diff. would 3 metadata files make over two?

Now, depending on the answers/responses to the above, I might well change my mind, but I'm going to ahead w/three sep. metadata files, one each for US, state and county file types.

Added the <notes> to fileTxt for each file type, but I'm not sure why - is this a reflection of something that should be coded in the metadata or is an aid to the access system? Also, *assume* that the gloss should be in <otherMat>, but not certain.

To do: ask about the issues above, run the metadata files through the proofing software tonight...

April 14, 2004

MPC 4.14.04

-General question: are proofing tools specific to STF files or are they flexible enough to proof other file structures? *appears* to be the case from the draft of the proofing guide...

Done today:
1. fixed locmap for US: added CubeCoord IDs, changed "TOTFLAG" from nCube to plain var b/c it is...
2. fixed locmap for state files; same story
3. coded the glossary

-There was a suggestion that perhaps the US and state files could be treated as two recGrps of a single file, but that won't work because they each have to have their own locMaps. locMap is non-repeatable and at the same level as fileTxt (which contains the recgrp element). Therefore, each file type needs its own complete .

Still to do:

-add the notes elements to each filetxt; separate the US/state to get 3 files total.

April 13, 2004

MPC 4.13.04

MPC monthly mtg. this morning, then mostly stumbling around trying to figure out the network, remember UNIX and so on...
spent some time looking at the Excel tracking file
-working on the US/state locMaps

April 12, 2004

MPC 4.12.04

1. Added FIPS and Census State Codes to county and state/us.
2. Add CubeCoord IDs to county
3. Got locmap finished for county
4. Added in SIC codes.
5. Added FIPS county codes; adjusted the naming convention slightly - because there's one file for each state, the fact that county codes repeat wasn't important for users of the CD. However, for a single metadata file to describe all the counties, there has to be a way to indicate to which state a particular county "001" belongs. So,
catValu ID = variable #_category #_FIPS/State

<catgry ID="C2_3165">
<catValu ID="C2_3165_56">001</catValu>
<labl level="catgry">Albany County</labl>

<catgry ID="C2_3036">
<catValu ID="C2_3036_54">001</catValu>
<labl level="catgry">Barbour County</labl>

Didn't want to mess up the category numbering or orphan the catValu IDs from the catgrys to which they belong, hence the result above...
6. Added in Census County Codes (which appear to be identical to FIPS) following pattern above, but with Census State codes rather than FIPS.

Still to do:
CubeCoord IDs for us/state, move glossary info into glossary section for all, work on filetxt for all...reassess when these tasks are finished.

April 11, 2004

The Tulips Survived!

We had to dig the tulips out of frozen soil and in a hurry. I had hoped that they would make it, but I was a little worried. However, as you can see, they're actually blooming. Not well, of course, but I think that there's hope for next year. I'm also relieved to see that in spite of being parked on by a backhoe and compressed heavily, the daffodils are coming right along...



April 9, 2004

MPC 4.9.04

CBP Metadata File Progress:

1. renamed data items, variables and nCubes to fit NHGIS conventions for all geo levels

2. revised establishment class size into 13 individual variables for county file; revised/added nCubes as needed.

3. removed SIC dimension from county nCubes because I would have ended up trying to place multiple data items within the same physical location. (now beginning to understand why it's tricky to think of SICs as being the outer layer of a nested nCube. am going to treat them just like geo variables on the loc map).

4. Still to do: add in FIPS state and county codes, Census state and county codes, SIC codes, CubeCoord IDs, move glossary info into glossary section, work on filetxt for all...reassess when these tasks are finished.

April 7, 2004

MPC 4.7.04

What must a metadata file have before handoff to programming?

What I *think* I've heard over the last two days:

1. Current programming structure does not use reference files - if a particular piece of information will be needed for programming uses, then that information must be in the metadata file (the SIC codes for creating a selection screen for example, then all the SIC codes must be explicitly added to the metadata).

2. Must follow NHGIS naming conventions - see http://www.pop.umn.edu/~wlt/manual/Manual-Section%204.doc
PROBLEM: nCube convention is "N + table number". There's only one table per filetype; that would make all the nCubes "N1". Will use same convention as variables: "Variable= V + table number + underscore + incremental number"

3. Geog attribute set to "Y" for appropriate elements

4. Temporal attribute set to "Y" for appropriate elements

5. In filetxt, must indicate type (geo, topic, date) and subject (0,1,2)

6. Glossary information must be in generic elements as described in GLOSS-Instructions.txt (3/5/04)

7. Must have NHGIS internal elements "lineno" and "fill'

Next: make CBP metadata compliant; see if it works...

"Section 3.1 discussion.doc" from 10/8/03 is a record of how filetxt is being used;
"StrctFunc.doc" is a explanation of how variations in file/description structure will affect record identification, selection, display/extraction

Wondering where I am?

Put my schedule up on the GPL web site.

April 6, 2004

MPC 4.6.04

Questions for mtg. on 4/7/04 re whipping the existing CBP metadata into shape:
-What's the minimum needed to have a file be ready for handoff to data access?
-took a prelim. look at final STF1 and can see that naming conventions have to change, but what else?
-What should I do w/the SIC codes?
-If I treat the SIC/NAICS codes as regular vars rather than StdCatgry, then we need clean copies of each set of codes as a ref. file for programming purposes, right?
-Would those side files need to be in DDI themselves?

Things for Amy to check on:

-how far back do machine readable Cbp files go? Assuming just 1977 for time being, need to check documentation (ICPSR or geostat?) to see how much diff. there is from 77 to 86; similarly need to look at doc. for 88-98 for differences. Mostly differences should be trivial - swapping out of one industrial classification for another (or same class., diff. version). However, have vague memory of actual content diff. from 87 to 88...

April 5, 2004

MPC Day 1.2

-had a look at a manual-in-progress covering the steps for proofing metadata files. Pretty straightforward, however, noticed that two variables being added that are not std. DDI: Lineno and fill. I can make a pretty good guess as to what they're used for, but, at least for fill (and using STF1's final metadata file), I'm not sure why it only appears at the beginning (there are often lots of blank spots in a given record). I can also see imagine why lineno isn't showing up in an nCube, but shouldn't it & fill be showing up in the locMap? It's referring back to the locMap...

-leaving that aside, it looks like the main thing I'd need to do to the CBP metadata file is change the naming conventions to match NHGIS. I could add a lineno var, but not sure it's necessary since I don't know how it's being used.

MPC Day 1


-make sure progress spreadsheet is clear to me
--think it is clear; question is whether its current or not
-get a walkthrough of what happens to the metadata files when they're handed over to the programmers (there is documentation on the NHGIS web site, but it's out-of-date to some degree; need to find out how much)
-check out the proofing that the metadata files go through - does this happen *before* being handed off to programmers? What if an error is missed and a file needs to be resubmitted?
-possibly prepare a session on STF file structure
-eventually wade into the temporal linking system (how the system knows St. Louis county in 1940 is the same as St. Louis county in 1950)
-in the course of the year, get the CBP data and descriptions from my project into the NHGIS system.

General tasks for the year:
-document processes
-manage movement of completed metadata into system
-provide explanations of file structures as needed
-possibly hire/sup. metadata coders

First blossoms!

love the crocuses....

What happened to the tulips?

Having dug up the tulips, I realized I didn't know which ones were which. So, since it was easier, I decided to keep them in containers until they bloomed so that I could replant them properly. For the time being they're on our newly tidy front porch/greenhouse:

April 1, 2004

Gettin' Political

If there's one thing that spending quality time with government publications does for a person, it's that it gives her a deeply skeptical attitude towards the government.

It's been postulated over the last year and a half, with varying degrees of subtlety, that to be against the war in Iraq is to a)hate America, b)love Saddam Hussein and c)generally be a clear indication that one is just inherently, irreedemably evil.

However, many of us who oppose the current war in Iraq do so because we've always thought that propping up Hussein was a bad idea and bombing the daylights of the Iraqi people doesn't seem like a good way to make up for past US policy. The US had a lot of other options for dealing with Iraq, many of which could have still involved armed intervention, but without alienating our allies or killing yet more Iraqi people. But we didn't take those options.

Why? Because the current administration is made of many of the same people in the Regan and Bush 1 administrations. We know, based on our familiarity with just the stuff the govt. wants to publish, that the U.S. government has been engaged in a jaw-droppingly cynical and immoral policy towards the people of Iraq since the first Regan administration.

When you factor in declassified documents and documents obtained through the Freedom of Information Act, it's obvious that this war is not about justice or freedom or democracy or domestic security or any of the other warm fuzzies the Bush admin. has been claiming. It's about power and dominance; the welfare of Iraqi and US citizens be damned.

This series of posts will back up these assertions with cites to, analysis of and, if I get sufficiently energetic, digitized copies of those government publications that have led me to these conclusions.