Data Contract

covid_cases

Info

Information about the data contract

Title
COVID-19 cases
Version
0.0.1
Description
Johns Hopkins University Consolidated data on COVID-19 cases, sourced from Enigma
links
{'blog': 'https://aws.amazon.com/blogs/big-data/a-public-data-lake-for-analysis-of-covid-19-data/', 'data-explorer': 'https://dj2taa9i652rf.cloudfront.net/', 'data': 'https://covid19-lake.s3.us-east-2.amazonaws.com/enigma-jhu/json/part-00000-adec1cd2-96df-4c6b-a5f2-780f092951ba-c000.json'}

Servers

Servers of the data contract

  • Server
    s3-json
    Type
    s3
    Location
    s3://covid19-lake/enigma-jhu/json/*.json
    Format
    json
    Delimiter
    new_line

Data Model

The logical data model

covid_cases table
the number of confirmed covid cases reported for a specified region, with location and county/province/country information.
fips
string
state and county two digits code
admin2
string
county name
province_state
string
province name or state name
country_region
string
country name or region name
last_update
timestamp_ntz
last update timestamp
latitude
double
location (latitude)
longitude
double
location (longitude)
confirmed
int
number of confirmed cases
combined_key
string
county name+state name+country name

Quality

SodaCL

checks for covid_cases:
- freshness(last_update::datetime) < 5000d
- row_count > 1000
Created at 27 Jun 2024 14:50:11 UTC with Data Contract CLI v0.10.8
dataContractSpecification: 0.9.3
id: covid_cases
info:
  title: COVID-19 cases
  version: 0.0.1
  description: Johns Hopkins University Consolidated data on COVID-19 cases, sourced
    from Enigma
  links:
    blog: https://aws.amazon.com/blogs/big-data/a-public-data-lake-for-analysis-of-covid-19-data/
    data-explorer: https://dj2taa9i652rf.cloudfront.net/
    data: https://covid19-lake.s3.us-east-2.amazonaws.com/enigma-jhu/json/part-00000-adec1cd2-96df-4c6b-a5f2-780f092951ba-c000.json
servers:
  s3-json:
    type: s3
    format: json
    delimiter: new_line
    location: s3://covid19-lake/enigma-jhu/json/*.json
models:
  covid_cases:
    description: the number of confirmed covid cases reported for a specified region,
      with location and county/province/country information.
    type: table
    fields:
      fips:
        type: string
        required: false
        primary: false
        unique: false
        description: state and county two digits code
      admin2:
        type: string
        required: false
        primary: false
        unique: false
        description: county name
      province_state:
        type: string
        required: false
        primary: false
        unique: false
        description: province name or state name
      country_region:
        type: string
        required: false
        primary: false
        unique: false
        description: country name or region name
      last_update:
        type: timestamp_ntz
        required: false
        primary: false
        unique: false
        description: last update timestamp
      latitude:
        type: double
        required: false
        primary: false
        unique: false
        description: location (latitude)
      longitude:
        type: double
        required: false
        primary: false
        unique: false
        description: location (longitude)
      confirmed:
        type: int
        required: false
        primary: false
        unique: false
        description: number of confirmed cases
      combined_key:
        type: string
        required: false
        primary: false
        unique: false
        description: county name+state name+country name
quality:
  type: SodaCL
  specification:
    checks for covid_cases:
    - freshness(last_update::datetime) < 5000d
    - row_count > 1000