Welcome to my final project for the datascience bootcamp!
For my project, I’m going to use the data from the CDC. Here’s the tibble that I’m going to be using:
# A tibble: 53,520 × 15
submission_date state tot_cases conf_cases prob_cases new_case
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 12/22/2021 DE 165076 151750 13326 662
2 03/18/2021 NE 206980 NA NA 298
3 09/01/2021 ND 118491 107475 11016 536
4 03/28/2022 VT 107785 NA NA 467
5 03/11/2021 MD 390490 NA NA 924
6 04/21/2022 ID 445350 348949 96401 0
7 02/02/2021 IL 1130917 1130917 0 2304
8 12/13/2020 MD 234647 NA NA 2638
9 06/15/2020 WI 25480 22932 2548 185
10 03/10/2020 CA 157 157 0 24
# … with 53,510 more rows, and 9 more variables: pnew_case <dbl>,
# tot_death <dbl>, conf_death <dbl>, prob_death <dbl>,
# new_death <dbl>, pnew_death <dbl>, created_at <chr>,
# consent_cases <chr>, consent_deaths <chr>
With this table, we can access a large amount of data on COVID-19. For example, I can access information about total cases and deaths, as well as daily stats.