Marriott School home page
SEARCH
CONTACT | CALENDAR | BYU
 
Description of Data File
 

The data file includes four fields IND, IND1, R, Z. IND and IND1 hold the four-digit categories and in combination, represent all possible pairs of four-digit industries. The R field, and the Z field contain the relatedness information. The R field is a percentile rank score and is a useful representation of the measure because it identifies where each pair lies in the distribution of all pairs. For example, a score of 97 implies that 97 percent of scores are less related, while 3 percent are more related. The Z field is a z-score or normalized representation of the measure in which the mean is 0 and the standard deviation is 1. The z-score may be useful for regression analysis, while the percentile score is the most readily interpretable.

 

The file is a comma delimited flat file and includes field headings, IND, IND1, R, Z. It contains 160,801 rows and can easily be opened in a standard spreadsheet. Standard database programs, such as Microsoft Access, can also handle the file without difficulty. Each industry pair is represented twice in the file for ease of linking to other data sets. For example the pair 3001 3002 is also listed 3002 3001, so that either IND or IND1 may be used to link to a particular relatedness score in a relational (SQL) query and the researcher need not worry about which of two industry reference fields is linked to which of two industry fields.

 

The file can be downloaded here.

 

If you have any questions about the measure or its use, please contact me at dbryce@byu.edu.

 

 

Maintained by the Marriott School Web Team
Copyright © 1996-2012 Brigham Young University. All Rights Reserved.