PUMS1
================
Win-Vector LLC
4/26/2018

``` r
library("DBI")
library("dplyr") 
```

    ## 
    ## Attaching package: 'dplyr'

    ## The following objects are masked from 'package:stats':
    ## 
    ##     filter, lag

    ## The following objects are masked from 'package:base':
    ## 
    ##     intersect, setdiff, setequal, union

``` r
library("rquery")
```

    ## Loading required package: wrapr

    ## 
    ## Attaching package: 'wrapr'

    ## The following object is masked from 'package:dplyr':
    ## 
    ##     coalesce

``` r
db <- dbConnect(RSQLite::SQLite(), ":memory:")  
dbWriteTable(db, "dpus", readRDS("ss16pus.RDS"))    
dbWriteTable(db, "dhus", readRDS("ss16hus.RDS"))


dbGetQuery(db, "SELECT * FROM dpus LIMIT 5")
```

    ##   RT  SERIALNO SPORDER  PUMA         ST  ADJINC AGEP              CIT CITWP
    ## 1  P 000000338      03 02701 Alabama/AL 1007588   06 Born in the U.S.  <NA>
    ## 2  P 000000338      05 02701 Alabama/AL 1007588   08 Born in the U.S.  <NA>
    ## 3  P 000000343      03 01400 Alabama/AL 1007588   12 Born in the U.S.  <NA>
    ## 4  P 000000539      04 01400 Alabama/AL 1007588   11 Born in the U.S.  <NA>
    ## 5  P 000002284      02 00600 Alabama/AL 1007588   08 Born in the U.S.  <NA>
    ##    COW DDRS DEAR DEYE DOUT DPHY DRAT DRATX DREM  ENG  FER  GCL  GCM  GCR HINS1
    ## 1 <NA>   No   No   No <NA>   No <NA>  <NA>   No <NA> <NA> <NA> <NA> <NA>    No
    ## 2 <NA>   No   No   No <NA>   No <NA>  <NA>   No <NA> <NA> <NA> <NA> <NA>    No
    ## 3 <NA>   No   No   No <NA>   No <NA>  <NA>  Yes <NA> <NA> <NA> <NA> <NA>    No
    ## 4 <NA>   No   No   No <NA>   No <NA>  <NA>  Yes <NA> <NA> <NA> <NA> <NA>    No
    ## 5 <NA>   No   No   No <NA>   No <NA>  <NA>   No <NA> <NA> <NA> <NA> <NA>   Yes
    ##   HINS2 HINS3 HINS4 HINS5 HINS6 HINS7 INTP JWMNP JWRIP JWTR LANP
    ## 1    No    No   Yes    No    No    No <NA>  <NA>  <NA> <NA> <NA>
    ## 2    No    No   Yes    No    No    No <NA>  <NA>  <NA> <NA> <NA>
    ## 3    No    No   Yes    No    No    No <NA>  <NA>  <NA> <NA> <NA>
    ## 4    No    No   Yes    No    No    No <NA>  <NA>  <NA> <NA> <NA>
    ## 5    No    No    No    No    No    No <NA>  <NA>  <NA> <NA> <NA>
    ##                      LANX                                 MAR MARHD MARHM MARHT
    ## 1 No, speaks only English Never married or under 15 years old  <NA>  <NA>  <NA>
    ## 2 No, speaks only English Never married or under 15 years old  <NA>  <NA>  <NA>
    ## 3 No, speaks only English Never married or under 15 years old  <NA>  <NA>  <NA>
    ## 4 No, speaks only English Never married or under 15 years old  <NA>  <NA>  <NA>
    ## 5 No, speaks only English Never married or under 15 years old  <NA>  <NA>  <NA>
    ##   MARHW MARHYP                         MIG  MIL MLPA MLPB MLPCD MLPE MLPFG MLPH
    ## 1  <NA>   <NA> Yes, same house (nonmovers) <NA> <NA> <NA>  <NA> <NA>  <NA> <NA>
    ## 2  <NA>   <NA> Yes, same house (nonmovers) <NA> <NA> <NA>  <NA> <NA>  <NA> <NA>
    ## 3  <NA>   <NA> Yes, same house (nonmovers) <NA> <NA> <NA>  <NA> <NA>  <NA> <NA>
    ## 4  <NA>   <NA> Yes, same house (nonmovers) <NA> <NA> <NA>  <NA> <NA>  <NA> <NA>
    ## 5  <NA>   <NA> Yes, same house (nonmovers) <NA> <NA> <NA>  <NA> <NA>  <NA> <NA>
    ##   MLPI MLPJ MLPK NWAB NWAV NWLA NWLK NWRE  OIP  PAP                       RELP
    ## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> Biological son or daughter
    ## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>    Stepson or stepdaughter
    ## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> Biological son or daughter
    ## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> Biological son or daughter
    ## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>             Other relative
    ##   RETP                                  SCH    SCHG         SCHL SEMP    SEX
    ## 1 <NA> Yes, public school or public college Grade 1 Kindergarten <NA> Female
    ## 2 <NA> Yes, public school or public college Grade 2      Grade 1 <NA> Female
    ## 3 <NA> Yes, public school or public college Grade 6      Grade 5 <NA> Female
    ## 4 <NA> Yes, public school or public college Grade 4      Grade 4 <NA>   Male
    ## 5 <NA> Yes, public school or public college Grade 1 Kindergarten <NA>   Male
    ##   SSIP  SSP WAGP WKHP  WKL  WKW  WRK YOEP          ANC            ANC1P
    ## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>       Single African American
    ## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>       Single African American
    ## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>     Multiple African American
    ## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> Not reported     Not reported
    ## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> Not reported     Not reported
    ##          ANC2P DECADE                  DIS DRIVESP                         ESP
    ## 1 Not reported   <NA> Without a disability    <NA> Both parents in labor force
    ## 2 Not reported   <NA> Without a disability    <NA> Both parents in labor force
    ## 3        Irish   <NA>    With a disability    <NA>   Mother in the labor force
    ## 4 Not reported   <NA>    With a disability    <NA> Both parents in labor force
    ## 5 Not reported   <NA> Without a disability    <NA>                        <NA>
    ##    ESR FHICOVP FOD1P FOD2P                          HICOV
    ## 1 <NA>      No  <NA>  <NA> With health insurance coverage
    ## 2 <NA>      No  <NA>  <NA> With health insurance coverage
    ## 3 <NA>      No  <NA>  <NA> With health insurance coverage
    ## 4 <NA>      No  <NA>  <NA> With health insurance coverage
    ## 5 <NA>     Yes  <NA>  <NA> With health insurance coverage
    ##                          HISP INDP JWAP JWDP MIGPUMA MIGSP  MSP NAICSP NATIVITY
    ## 1 Not Spanish/Hispanic/Latino <NA> <NA> <NA>    <NA>  <NA> <NA>   <NA>   Native
    ## 2 Not Spanish/Hispanic/Latino <NA> <NA> <NA>    <NA>  <NA> <NA>   <NA>   Native
    ## 3 Not Spanish/Hispanic/Latino <NA> <NA> <NA>    <NA>  <NA> <NA>   <NA>   Native
    ## 4 Not Spanish/Hispanic/Latino <NA> <NA> <NA>    <NA>  <NA> <NA>   <NA>   Native
    ## 5 Not Spanish/Hispanic/Latino <NA> <NA> <NA>    <NA>  <NA> <NA>   <NA>   Native
    ##                                            NOP               OC OCCP PAOC PERNP
    ## 1 Living with two parents: Both parents NATIVE              Yes <NA> <NA>  <NA>
    ## 2 Living with two parents: Both parents NATIVE              Yes <NA> <NA>  <NA>
    ## 3       Living with mother only: Mother NATIVE              Yes <NA> <NA>  <NA>
    ## 4 Living with two parents: Both parents NATIVE              Yes <NA> <NA>  <NA>
    ## 5                                         <NA> No (includes GQ) <NA> <NA>  <NA>
    ##   PINCP       POBP POVPIP POWPUMA POWSP
    ## 1  <NA> Alabama/AL    158    <NA>  <NA>
    ## 2  <NA> Alabama/AL    158    <NA>  <NA>
    ## 3  <NA> Alabama/AL    072    <NA>  <NA>
    ## 4  <NA> Alabama/AL    003    <NA>  <NA>
    ## 5  <NA> Alabama/AL    079    <NA>  <NA>
    ##                                     PRIVCOV                         PUBCOV
    ## 1 Without private health insurance coverage    With public health coverage
    ## 2 Without private health insurance coverage    With public health coverage
    ## 3 Without private health insurance coverage    With public health coverage
    ## 4 Without private health insurance coverage    With public health coverage
    ## 5    With private health insurance coverage Without public health coverage
    ##                  QTRBIR                           RAC1P
    ## 1    April through June Black or African American alone
    ## 2 January through March Black or African American alone
    ## 3    April through June               Two or More Races
    ## 4 January through March                     White alone
    ## 5    April through June                     White alone
    ##                             RAC2P                            RAC3P RACAIAN
    ## 1 Black or African American alone  Black or African American alone      No
    ## 2 Black or African American alone  Black or African American alone      No
    ## 3               Two or More Races White; Black or African American      No
    ## 4                     White alone                      White alone      No
    ## 5                     White alone                      White alone      No
    ##   RACASN RACBLK RACNH RACNUM RACPI RACSOR RACWHT  RC SCIENGP SCIENGRLP  SFN
    ## 1     No    Yes    No      1    No     No     No Yes    <NA>      <NA> <NA>
    ## 2     No    Yes    No      1    No     No     No Yes    <NA>      <NA> <NA>
    ## 3     No    Yes    No      2    No     No    Yes Yes    <NA>      <NA> <NA>
    ## 4     No     No    No      1    No     No    Yes Yes    <NA>      <NA> <NA>
    ## 5     No     No    No      1    No     No    Yes Yes    <NA>      <NA> <NA>
    ##    SFR SOCP  VPS                     WAOB FAGEP FANCP FCITP FCITWP FCOWP FDDRSP
    ## 1 <NA> <NA> <NA> US state (POB = 001-059)    No    No    No     No    No     No
    ## 2 <NA> <NA> <NA> US state (POB = 001-059)    No    No    No     No    No     No
    ## 3 <NA> <NA> <NA> US state (POB = 001-059)    No    No    No     No    No     No
    ## 4 <NA> <NA> <NA> US state (POB = 001-059)    No    No    No     No    No     No
    ## 5 <NA> <NA> <NA> US state (POB = 001-059)    No    No    No     No    No    Yes
    ##   FDEARP FDEYEP FDISP FDOUTP FDPHYP FDRATP FDRATXP FDREMP FENGP FESRP FFERP
    ## 1     No     No    No     No     No     No      No     No    No    No    No
    ## 2     No     No    No     No     No     No      No     No    No    No    No
    ## 3     No     No    No     No     No     No      No     No    No    No    No
    ## 4     No     No    No     No     No     No      No     No    No    No    No
    ## 5    Yes    Yes   Yes     No    Yes     No      No    Yes    No    No    No
    ##   FFODP FGCLP FGCMP FGCRP FHINS1P FHINS2P FHINS3C FHINS3P FHINS4C FHINS4P
    ## 1    No    No    No    No      No      No    <NA>      No      No      No
    ## 2    No    No    No    No      No      No    <NA>      No      No      No
    ## 3    No    No    No    No      No      No    <NA>      No      No      No
    ## 4    No    No    No    No      No      No    <NA>      No      No      No
    ## 5    No    No    No    No     Yes     Yes    <NA>     Yes    <NA>     Yes
    ##   FHINS5C FHINS5P FHINS6P FHINS7P FHISP FINDP FINTP FJWDP FJWMNP FJWRIP FJWTRP
    ## 1    <NA>      No      No      No    No    No    No    No     No     No     No
    ## 2    <NA>      No      No      No    No    No    No    No     No     No     No
    ## 3    <NA>      No      No      No    No    No    No    No     No     No     No
    ## 4    <NA>      No      No      No    No    No    No    No     No     No     No
    ## 5    <NA>     Yes     Yes     Yes    No    No    No    No     No     No     No
    ##   FLANP FLANXP FMARHDP FMARHMP FMARHTP FMARHWP FMARHYP FMARP FMIGP FMIGSP
    ## 1    No     No      No      No      No      No      No    No    No     No
    ## 2    No     No      No      No      No      No      No    No    No     No
    ## 3    No     No      No      No      No      No      No    No    No     No
    ## 4    No     No      No      No      No      No      No    No    No     No
    ## 5    No     No      No      No      No      No      No    No    No     No
    ##   FMILPP FMILSP FOCCP FOIP FPAP FPERNP FPINCP FPOBP FPOWSP FPRIVCOVP FPUBCOVP
    ## 1     No     No    No   No   No     No     No    No     No        No       No
    ## 2     No     No    No   No   No     No     No    No     No        No       No
    ## 3     No     No    No   No   No     No     No    No     No        No       No
    ## 4     No     No    No   No   No     No     No    No     No        No       No
    ## 5     No     No    No   No   No     No     No    No     No       Yes      Yes
    ##   FRACP FRELP FRETP FSCHGP FSCHLP FSCHP FSEMP FSEXP FSSIP FSSP FWAGP FWKHP
    ## 1    No    No    No     No     No    No    No    No    No   No    No    No
    ## 2    No    No    No     No     No    No    No    No    No   No    No    No
    ## 3    No    No    No     No     No    No    No    No    No   No    No    No
    ## 4    No    No    No     No     No    No    No    No    No   No    No    No
    ## 5    No    No    No     No     No    No    No    No    No   No    No    No
    ##   FWKLP FWKWP FWRKP FYOEP
    ## 1    No    No    No    No
    ## 2    No    No    No    No
    ## 3    No    No    No    No
    ## 4    No    No    No    No
    ## 5    No    No    No    No

``` r
dpus <- tbl(db, "dpus") 
dhus <- tbl(db, "dhus")

# print(dpus)   
# glimpse(dpus)

# view(rsummary(db, "dpus")) 




dpus <- dbReadTable(db, "dpus") 

dpus <- dpus[, c("AGEP", "COW", "ESR",  "PERNP", 
                 "PINCP","SCHL", "SEX", "WKHP")]

                 
for(ci in c("AGEP", "PERNP", "PINCP", "WKHP")) {    
  dpus[[ci]] <- as.numeric(dpus[[ci]])
}
```

    ## Warning: pojawiły się wartości NA na skutek przekształcenia
    
    ## Warning: pojawiły się wartości NA na skutek przekształcenia

``` r
dpus$COW <- strtrim(dpus$COW, 50) 

# str(dpus) 




target_emp_levs <- c(  
  "Employee of a private for-profit company or busine",
  "Employee of a private not-for-profit, tax-exempt, ",
  "Federal government employee",                    
  "Local government employee (city, county, etc.)",   
  "Self-employed in own incorporated business, profes",
  "Self-employed in own not incorporated business, pr",
  "State government employee")


complete <- complete.cases(dpus) 

stdworker <- with(dpus,  
                  (PINCP>1000) & 
                    (ESR=="Civilian employed, at work") & 
                    (PINCP<=250000) & 
                    (PERNP>1000) & (PERNP<=250000) & 
                    (WKHP>=30) & 
                    (AGEP>=18) & (AGEP<=65) & 
                    (COW %in% target_emp_levs))
                    
dpus <- dpus[complete & stdworker, , drop = FALSE] 

no_advanced_degree <- is.na(dpus$SCHL) |  
  (!(dpus$SCHL %in% c("Associate's degree",
                      "Bachelor's degree",
                      "Doctorate degree",
                      "Master's degree",
                      "Professional degree beyond a bachelor's degree")))
dpus$SCHL[no_advanced_degree] <- "No Advanced Degree"

dpus$SCHL <- relevel(factor(dpus$SCHL),     
                     "No Advanced Degree")                
dpus$COW <- relevel(factor(dpus$COW), 
                    target_emp_levs[[1]])
dpus$ESR <- relevel(factor(dpus$ESR), 
                    "Civilian employed, at work")
dpus$SEX <- relevel(factor(dpus$SEX), 
                    "Male")
                    
saveRDS(dpus, "dpus_std_employee.RDS")  
                    
summary(dpus) 
```

    ##       AGEP                                                       COW       
    ##  Min.   :18.00   Employee of a private for-profit company or busine:26101  
    ##  1st Qu.:31.00   Employee of a private not-for-profit, tax-exempt, : 2877  
    ##  Median :41.00   Federal government employee                       :  978  
    ##  Mean   :41.34   Local government employee (city, county, etc.)    : 2589  
    ##  3rd Qu.:52.00   Self-employed in own incorporated business, profes: 1180  
    ##  Max.   :65.00   Self-employed in own not incorporated business, pr: 1643  
    ##                  State government employee                         : 1743  
    ##                          ESR            PERNP            PINCP       
    ##  Civilian employed, at work:37111   Min.   :  1100   Min.   :  1100  
    ##                                     1st Qu.: 24000   1st Qu.: 25000  
    ##                                     Median : 40000   Median : 40000  
    ##                                     Mean   : 50268   Mean   : 51584  
    ##                                     3rd Qu.: 65000   3rd Qu.: 66000  
    ##                                     Max.   :250000   Max.   :250000  
    ##                                                                      
    ##                                              SCHL           SEX       
    ##  No Advanced Degree                            :20461   Male  :20125  
    ##  Associate's degree                            : 3670   Female:16986  
    ##  Bachelor's degree                             : 8407                 
    ##  Doctorate degree                              :  478                 
    ##  Master's degree                               : 3386                 
    ##  Professional degree beyond a bachelor's degree:  709                 
    ##                                                                       
    ##       WKHP      
    ##  Min.   :30.00  
    ##  1st Qu.:40.00  
    ##  Median :40.00  
    ##  Mean   :42.66  
    ##  3rd Qu.:45.00  
    ##  Max.   :98.00  
    ## 

``` r
plt <- WVPlots::ScatterHist(
  dpus, "AGEP", "PINCP",
  "Expected income (PINCP) as function age (AGEP)",
  smoothmethod = "lm",
  point_alpha = 0.025)
```

![](PUMS1_files/figure-gfm/unnamed-chunk-1-1.png)<!-- -->

``` r
print(plt)
```

    ## TableGrob (3 x 2) "arrange": 5 grobs
    ##   z     cells    name                grob
    ## 1 1 (2-2,1-1) arrange      gtable[layout]
    ## 2 2 (2-2,2-2) arrange      gtable[layout]
    ## 3 3 (3-3,1-1) arrange      gtable[layout]
    ## 4 4 (3-3,2-2) arrange      gtable[layout]
    ## 5 5 (1-1,1-2) arrange text[GRID.text.140]

``` r
DBI::dbDisconnect(db)
```
