Standard Readme
Note
This page is a slightly edited version of the README file written by Jean Roth and located at /disk/aging/medicare/data/README.
Filenames are of the form TYPEpercent_YEAR_divisionFILE. However, Jean Roth is moving these to filenames that include version, like TYPEpercentVersion_YEAR_divisionFILE. The old names will still work because they should by symbolic links pointing to the new filenames when they are added.
Type¶
TYPE is [bsf|car|den|dme|hha|hos|hsk|ip|med|op|snf].
bsf =Master Beneficiary Summary File. enrollment information.car =Carrier. Previously known as Physician/Supplier and Part B.den =Denominator. An abbreviated version of the enrollment database (EDB) starting 1991. Contains data on all medicare beneficiaries enrolled and/or entitled in a given year. nrol is a subdirectory with enrollment summary filesdme =Durable Medical Equipmenthha =Home Health Agencyhos =Hospicehsk =HISKEW, Health Insurance Skeleton Eligibility Write-Off. An abbreviated version of the enrollment database (EDB) Contains data on all beneficiaries ever entitled to medicare. We have this for 1985-1995.ip =Inpatientmed =MEDPAR, Medical Provider Analysis and Reviewop =Outpatientsnf =Skilled Nursing Facility
Percent¶
percent is 100, 20, 05, 01, or 0001. See below for more detail.
The percent files have been made by request. So if a small sample
has not made from the file you want email Jean jroth@nber.org
to request an extraction.
Version¶
Version is a letter such as g, h, i, or j.
Year¶
YEAR is four-digit year
Division¶
division if applicable is
clms =>claims,revcntrs =>revenue centers forHHA,IP,OP,SNFclms =>claims,lnits =>line items forCAR,DME
FILEs that have been split into divisions can be merged on EHIC and claimindex.
2002-2005 DME files are split by the RIC_CD(Record Identification Code) variable.
In the DME files, RIC_CD takes two values: M and O (capital-Oh).
M =Part B DMEPOS claim record (processed by DME Regional Carrier) (effective 10/93);O =Part B physician/supplier claim record (processed by local carriers, can include DMEPOS services); DME files split this way are called dmeo or dmem .
Not all TYPES of files are split into divisions, and not all YEARs of a TYPE will be split into divisions.
File¶
FILE is 1 to N .
Sampling Method¶
Recent denominator, inpatient, outpatient, and medpar files are 100% files.
Jean Roth created the 20%, 5%, 1%, and 0.1% files based on digits from EHIC:
From start-2005:
- A 20% file has
0or5in the 9th position ofEHIC. - A 5% file has
05,20,45,70,95in positions 8-9 ofEHIC. - A 1% file has
45in positions 8-9 ofEHIC. - A 0.1% file has
1545in positions 6-9 ofEHIC.
From 2006-present:
- The 20% and 5% files are consistent but are now identified by the CMS contractor.
- The 1% file has
DDin the 14th and 15th position ofBENE_ID. - The 0.1% file has
DDDDin the 12th-15th position ofBENE_ID.
Note that since we have "only" a 20% Carrier, the "1%" file is actually a 0.2% for 2006-present. Similarly, the "0.1%" actually has (0.1*.2)% or 0.002% of all medicare Carrier beneficiaries.
There is a break in the 1% and smaller sample from 2005 to 2006; however, since these files are mostly used for testing, it should be okay. I know this is confusing.
The exception is Carrier (Part B) files. 1998-on files are 20% files.
Earlier, 5% was the maximum. From 1991-2000, the 5% files had
05, 20, 45, 70, 95 in positions 8-9 of EHIC. So, the 5%
Carrier files are just symbolic links to those files in the 20pct directory.
Note
The 1985-1990 files had 00-99 in positions 8-9 of EHIC; however, 05, 20, 45, 70, 95 makes up 14-16% each of of the EHICs position 8-9. These appear to be all End Stage Renal Disease (ESRD) beneficiaries.
Also noteworthy about the Carrier files: In 1998-2000, there are two
versions, H and I. 1998, 1999, and 2000 a files are version H. and
1998-1999 b, c, and d files and 2000 files are version I files.
My guess is that at one time, the 5% Version H files were bought.
At a later time, it became possible to get 20% files, so the 1998-1999
b, c, and d files and the 2000 files were bought. Both H and I are
needed to make a 20% sample for 1998 and 1999. It appears based on sample
size that the 2000 files are four-5% draws, and that the 2000 a files
are only useful for historic reasons.
2002-2005 Carrier files are split into four 5% draws f[1-4]
8th and 9th position of EHIC.
- f1 ~20% each in (
05,20,45,70,95); - f2 ~20% each in (
00,10,15,25,30); - f3 ~20% each in (
35,40,50,55,60); - f4 ~20% each in (
65,75,80,85,90);
2002-2005 Carrier and DME files are split in to claims clms and line items
lnits file. The files can be merged using EHIC and claimindex.
The claims data uses BENE_ID as an identifier beginning in 2006. /disk/agedisk1/medicare/data.NOBACKUP/u/c/100pct/xw
has a crosswalk between BENE_ID and EHIC .
/disk/aging/medicare/data/`PCT'pct/bsf/[CCYY]/1/unique_in_both_ids[CCYY].[dta|sas7bdat]
EHIC and BENE_ID where beneficiaries have unique IDs
in both the EHIC and BENE_ID.
[email protected] Last updated 2016-06-24