This repository has been archived by the owner on Dec 25, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 4
Data Sources
Peter Rowland edited this page Feb 12, 2020
·
3 revisions
Privacy Impact Assessments (PIAs) and System of Record Notices (SORNs) are the primary sources of data we process and analyze. For Phase 3, we are working with GSA's PIAs and SORNs.
GSA publishes all system PIAs on its website - GSA PIAs. All agencies are required to have a page on its website to publish PIAs.
PDF only
GSA uses a template for PIAs. The sections that we extract information from are:
- 3.2: Personally Identifiable Information (PII)
- 1.1: Purpose of collection
- 1.5: Retention Policy
- 4.2: Sharing
- 1.3: SORN-ID
Note: Older PIAs are not in this format, and in some cases the relevant information is contained in other sections of the document.
SORNs are available in both PDF and machine-readable (though not structured) XML.
These are the sections we extract information from are:
SYSTEM NAME
CATEGORIES OF RECORDS IN THE SYSTEM
PURPOSE
RETENTION AND DISPOSAL
ROUTINE USES OF RECORDS MAINTAINED IN THE SYSTEM INCLUDING CATEGORIES OF USERS AND THE PURPOSES OF SUCH USES
- We also get the official document title, which is usually the first element in the
<PRIACT>
section.