(non data consumers can ignore this message)
For any consumers of the phase 1 data results, the old format is no longer
being used; any further updates to the data set (ie, filtering out dupes
or updating erroneous markers) will be done to the modified set and not to
the old. Currently, there is no difference in content between
america.fix.zip and
america.newformat.zip
The new format is somewhat more verbose and causes the data to be 2.5
times as large, as well as long and skinny. The reason for the new format
was to generalize the structure of the records, s.t. I didn't need to have
specialized parsing for each portion. Primarily, this is due to the
continual addition of new bits of information to be stored, and the
maintenance problems with the old style therecaused.
The file now consists of human readable, general serializations of nested
vector/hashtable combinations. In some ways it's actually more legible.
The Golem (serialization/deserialization) Java file will be provided along
with the data, should you wish to have a premade/debugged function that
loads the key/value pairs into memory.