We’ll need the following packages
## general database tools
library(dbplyr)
library(RMySQL)
## access functions for the specific example
library(mdsr)
The dbConnect()
function allows us to open a connection to a remote database. The dbConnect_scidb()
function is a wrapper to quickly access the scidb database at Smith College, which the examples (and lab) will use.
The db
object is a connection: it doesn’t contain data itself, rather it’s like a portal through which we can bring data from the remote database.
In order to know what data is available, we can list the datasets (tables) that are visible through the connection.
## [1] "airports" "carriers" "flights" "planes"
This is the collection of flights datasets but in database form.
We can create an R object that we can then interact with as if it were a dataset we had read in (it hasn’t actually read in the data to memory though, so you can’t open it in the RStudio data viewer)
dplyr
wrangling with the R view of the dataWe can use this flights
with dplyr
wrangling functions much like we could if it were an actual R dataset. For example:
query <- flights %>%
filter(year == 2013 & month == 9) %>%
filter(dest == "JFK") %>%
group_by(carrier) %>%
summarize(
N = n(),
pct_ontime = sum(arr_delay <= 15) / n()) %>%
inner_join(carriers, by = c("carrier" = "carrier")) %>%
select(name, N, pct_ontime) %>%
filter(N >= 100) %>%
arrange(desc(pct_ontime))
## Warning: Missing values are always removed in SQL.
## Use `SUM(x, na.rm = TRUE)` to silence this warning
## This warning is displayed only once per session.
## Warning in .local(conn, statement, ...): Decimal MySQL column 2 imported as
## numeric
## # Source: lazy query [?? x 3]
## # Database: mysql 5.7.33-log
## # [@mdsr.cdc7tgkkqd0n.us-east-1.rds.amazonaws.com:/airlines]
## # Ordered by: desc(pct_ontime)
## name N pct_ontime
## <chr> <dbl> <dbl>
## 1 Delta Air Lines Inc. 1770 0.876
## 2 Endeavor Air Inc. 1067 0.848
## 3 American Airlines Inc. 1055 0.846
## 4 Virgin America 291 0.845
## 5 JetBlue Airways 3341 0.826
Note that not all dplyr
functions will work because they require a tibble. For example, the slice_*()
functions don’t work.
The wrangling we did above could be done by writing an SQL query. In fact, R has to translate our dplyr
pipeline into SQL before it can be executed.
We can see the auto-generated SQL code, though because it’s generated algorithmically, it’s often a lot less intelligible than actual SQL code written directly would be.
## <SQL>
## SELECT *
## FROM (SELECT `name`, `N`, `pct_ontime`
## FROM (SELECT `LHS`.`carrier` AS `carrier`, `N`, `pct_ontime`, `name`
## FROM (SELECT `carrier`, COUNT(*) AS `N`, SUM(`arr_delay` <= 15.0) / COUNT(*) AS `pct_ontime`
## FROM (SELECT *
## FROM `flights`
## WHERE (`year` = 2013.0 AND `month` = 9.0)) `q01`
## WHERE (`dest` = 'JFK')
## GROUP BY `carrier`) `LHS`
## INNER JOIN `carriers` AS `RHS`
## ON (`LHS`.`carrier` = `RHS`.`carrier`)
## ) `q01`) `q02`
## WHERE (`N` >= 100.0)
## ORDER BY `pct_ontime` DESC
Here’s what the query (with head(n=5)
) might look like if we actually wrote it in SQL ourselves. (This will make more sense after you do the lab)
SELECT
c.name,
sum(1) as N,
sum(arr_delay <= 15) / sum(1) as pct_ontime
FROM flights f
JOIN carriers c ON f.carrier = c.carrier
WHERE year = 2013 AND month = 9 AND dest = 'JFK'
GROUP BY name
HAVING N >= 100
ORDER BY pct_ontime desc
LIMIT 0,5;
name | N | pct_ontime |
---|---|---|
Delta Air Lines Inc. | 1770 | 0.8757 |
Endeavor Air Inc. | 1067 | 0.8482 |
American Airlines Inc. | 1055 | 0.8455 |
Virgin America | 291 | 0.8454 |
JetBlue Airways | 3341 | 0.8264 |
Sometimes we can get the SQL code equivalent to a simple R command by making use of the algorithmic translation.
## Warning: Missing values are always removed in SQL.
## Use `AVG(x, na.rm = TRUE)` to silence this warning
## This warning is displayed only once per session.
## <SQL> AVG(`arr_delay`) OVER ()
Though the above isn’t quite equivalent to what the R function would do. It’s actually more like the R code:
## <SQL> AVG(`arr_delay`) OVER ()
If the data isn’t too big, we can convert the whole thing into an actual R dataset (rather than an R view of the SQL data, which is what we did above) with collect()
. This will let us use some R code that might not work on the “view”.
But this should be done with caution, because bigger datasets can overload the system (probably don’t want to do this with flights
, for instance)
## # A tibble: 1,610 x 2
## carrier name
## <chr> <chr>
## 1 02Q Titan Airways
## 2 04Q Tradewind Aviation
## 3 05Q Comlux Aviation, AG
## 4 06Q Master Top Linhas Aereas Ltd.
## 5 07Q Flair Airlines Ltd.
## 6 09Q Swift Air, LLC
## 7 0BQ DCA
## 8 0CQ ACM AIR CHARTER GmbH
## 9 0GQ Inter Island Airways, d/b/a Inter Island Air
## 10 0HQ Polar Airlines de Mexico d/b/a Nova Air
## # … with 1,600 more rows
To see the difference, we can look at how much memory the “view” object takes up, compared to the collect()
ed version
## 3.6 Kb
## 234.8 Kb
It’s about 70 times bigger.
Imagine we had a dataset with a million cases and 100 variables…
n_entries <- 100 * 1000000
LargeData <- runif(n_entries) %>% matrix(ncol = 100) %>% as.data.frame()
head(LargeData)
## V1 V2 V3 V4 V5 V6
## 1 9.582709e-01 0.9922512 0.1110860 0.2018943 0.92692726 0.37770663
## 2 8.275782e-01 0.2445587 0.9042847 0.4021900 0.40541440 0.87778504
## 3 5.211565e-05 0.7543153 0.8484690 0.2576616 0.14036474 0.33231068
## 4 5.238584e-01 0.7341216 0.9096582 0.9411570 0.08149133 0.70059507
## 5 1.857047e-01 0.9998941 0.4997949 0.8504782 0.35023587 0.92871656
## 6 3.931427e-01 0.1029113 0.7763494 0.4322868 0.06987871 0.02113734
## V7 V8 V9 V10 V11 V12 V13
## 1 0.9241682 0.7321434 0.6457445 0.84090746 0.37699683 0.11303564 0.8445392
## 2 0.9409175 0.8463810 0.6960195 0.28127747 0.33710478 0.09618753 0.8602856
## 3 0.4061917 0.6593970 0.4816522 0.04773262 0.60816545 0.93996437 0.8947656
## 4 0.2996085 0.4513319 0.8648686 0.91937293 0.63258345 0.96702538 0.5577387
## 5 0.8847899 0.5804747 0.4582482 0.43036180 0.07115559 0.38610356 0.6712138
## 6 0.1112059 0.9062408 0.6533471 0.23267113 0.95912483 0.74110738 0.5601601
## V14 V15 V16 V17 V18 V19 V20
## 1 0.4215485 0.5024041 0.3324905 0.5798861 0.73656287 0.9058820 0.06553702
## 2 0.7967206 0.5880284 0.6873303 0.5340180 0.99820694 0.5521920 0.73798591
## 3 0.3784760 0.7905586 0.8359667 0.1974253 0.08187754 0.3770132 0.56443771
## 4 0.5431132 0.8198253 0.4652053 0.6069765 0.38944993 0.2278911 0.28811750
## 5 0.1293676 0.2073670 0.3025314 0.9324739 0.36084299 0.9156316 0.92158419
## 6 0.2737923 0.6725885 0.9131288 0.5731909 0.88643173 0.9495621 0.79468594
## V21 V22 V23 V24 V25 V26 V27
## 1 0.5268952 0.22102920 0.1614168 0.5398129 0.5534918 0.63053215 0.9138582
## 2 0.7184214 0.47497728 0.7540147 0.5170090 0.7088770 0.66099985 0.6457488
## 3 0.4521553 0.10054078 0.7048759 0.8284640 0.2597262 0.05071979 0.1015634
## 4 0.3425029 0.03766112 0.7062673 0.3940008 0.9554836 0.06397756 0.1072469
## 5 0.0808702 0.71542450 0.2866293 0.3385621 0.8423259 0.70070237 0.4397458
## 6 0.9134128 0.10396219 0.5619998 0.8792314 0.9389477 0.08229026 0.9443304
## V28 V29 V30 V31 V32 V33 V34
## 1 0.95372513 0.8196622 0.3350209 0.11008057 0.5885140 0.2824735 0.46979932
## 2 0.83527093 0.3982447 0.3380713 0.99021123 0.4240839 0.6628323 0.38366060
## 3 0.65671663 0.8255118 0.8969452 0.65022025 0.4359607 0.7181876 0.22302089
## 4 0.77768441 0.3026732 0.4322319 0.06919407 0.4425501 0.7780290 0.62058674
## 5 0.07671021 0.1579417 0.4917127 0.52673065 0.9817886 0.4669908 0.43678329
## 6 0.56186048 0.5146895 0.5882684 0.66306858 0.6195238 0.3501962 0.04590658
## V35 V36 V37 V38 V39 V40 V41
## 1 0.3124830 0.5470640 0.1808295 0.7330374 0.22604512 0.1875547 0.8345588
## 2 0.5881805 0.7342948 0.3853905 0.3758414 0.02682627 0.6894175 0.7174112
## 3 0.4542990 0.9491330 0.2006372 0.9563640 0.66758427 0.3178381 0.9217305
## 4 0.5084065 0.9848869 0.7816868 0.3391856 0.79361847 0.4551915 0.2143525
## 5 0.7209006 0.3152856 0.7292400 0.6094060 0.24921339 0.1156424 0.5158954
## 6 0.4309317 0.4723814 0.6922514 0.5415480 0.20744677 0.8530895 0.2648333
## V42 V43 V44 V45 V46 V47 V48
## 1 0.08691973 0.45780129 0.3639167 0.2329356 0.6524616 0.3093971 0.85254289
## 2 0.14015128 0.82960765 0.6484462 0.5121706 0.1763923 0.4618049 0.97822864
## 3 0.34644847 0.11776351 0.8821448 0.2468129 0.5940904 0.8928482 0.72971691
## 4 0.56506518 0.05162098 0.9412839 0.8606908 0.6702378 0.2151544 0.68620139
## 5 0.41518479 0.57804145 0.8208939 0.8159484 0.2133537 0.7794204 0.04970226
## 6 0.54492781 0.77628252 0.4784446 0.7489294 0.4179912 0.1018776 0.08423770
## V49 V50 V51 V52 V53 V54
## 1 0.05011054 0.19976264 0.346614554 0.9770470 0.8267348 0.6334371
## 2 0.31570993 0.58907720 0.781645620 0.8480106 0.6347199 0.6175607
## 3 0.57858472 0.79191134 0.515945126 0.3766002 0.3026915 0.2345405
## 4 0.70633191 0.82395169 0.377163650 0.9984267 0.6800605 0.1634893
## 5 0.28389753 0.58162969 0.920986567 0.5618137 0.9910437 0.6067510
## 6 0.12919771 0.03711531 0.001498685 0.2329754 0.4613257 0.5295072
## V55 V56 V57 V58 V59 V60
## 1 0.53094921 0.21623788 0.8614596 0.33171237 0.9341563 0.2200101
## 2 0.47861894 0.73678839 0.7281260 0.14987313 0.5004925 0.6787636
## 3 0.72074411 0.68317191 0.2861561 0.53523440 0.3076221 0.1334552
## 4 0.86068967 0.04197453 0.1593689 0.52065730 0.9146829 0.6291156
## 5 0.04583296 0.37435635 0.8326694 0.06079883 0.6847007 0.2338096
## 6 0.83359670 0.97424794 0.2428933 0.78732705 0.5660282 0.1880929
## V61 V62 V63 V64 V65 V66
## 1 0.47981659 0.3152615 0.234957625 0.20673160 0.28288922 0.4930318
## 2 0.02402208 0.3130100 0.007754755 0.78323134 0.83275916 0.8413838
## 3 0.35898984 0.7538553 0.332491866 0.22549944 0.89242656 0.8668341
## 4 0.94779962 0.9199844 0.369502428 0.41298377 0.35668981 0.2780697
## 5 0.86410667 0.8637177 0.976842775 0.03755392 0.09185347 0.2204420
## 6 0.28906298 0.3919817 0.528322783 0.15433444 0.15536724 0.8300934
## V67 V68 V69 V70 V71 V72
## 1 0.4252947 0.7002006 0.777502397 0.21700370 0.17315534 0.3743983
## 2 0.1952382 0.1936738 0.007402987 0.09291439 0.05342468 0.7761527
## 3 0.5683557 0.5022794 0.421756913 0.37493134 0.20992576 0.9337587
## 4 0.6870402 0.6021943 0.477648763 0.15935415 0.01790243 0.3987472
## 5 0.4998683 0.2721193 0.574701922 0.09861315 0.19274351 0.7465204
## 6 0.7169159 0.9765102 0.516474072 0.74254187 0.18122967 0.7793678
## V73 V74 V75 V76 V77 V78 V79
## 1 0.5974588 0.76641946 0.1153148 0.6061773 0.84756187 0.4838876 0.8577100
## 2 0.3950155 0.15711291 0.2940018 0.4688531 0.05814825 0.4487619 0.4451039
## 3 0.0402423 0.58409636 0.3314035 0.8811726 0.86637747 0.3525458 0.1333492
## 4 0.2260923 0.44701136 0.2872220 0.3407173 0.59933234 0.3292218 0.1050910
## 5 0.6262719 0.01958447 0.8956871 0.5171751 0.37101680 0.2528866 0.4862446
## 6 0.8462474 0.27717063 0.7396417 0.1456409 0.48084612 0.5423240 0.1755146
## V80 V81 V82 V83 V84 V85 V86
## 1 0.48708573 0.26643755 0.6820924 0.61044474 0.5608281 0.5623676 0.4567782
## 2 0.50336126 0.55115940 0.5023934 0.87433810 0.1477371 0.6742229 0.1601285
## 3 0.55604233 0.28791968 0.1651268 0.09728313 0.4839350 0.3967569 0.3787584
## 4 0.84355108 0.26052866 0.8070973 0.17624895 0.2043278 0.1271272 0.0144499
## 5 0.81585905 0.04895874 0.1797900 0.29220557 0.3791552 0.5182834 0.3905862
## 6 0.03522949 0.21229430 0.0849170 0.49969182 0.4004684 0.9415648 0.1258903
## V87 V88 V89 V90 V91 V92 V93
## 1 0.7626591 0.2180036 0.7442870 0.41256313 0.7501933 0.4564545 0.9580733
## 2 0.8328556 0.7185586 0.8620664 0.47658683 0.8160589 0.4328343 0.3926681
## 3 0.1813921 0.4468990 0.4924655 0.16937879 0.9484113 0.6650279 0.6605460
## 4 0.5160538 0.1310464 0.7629739 0.22056204 0.9960883 0.2635354 0.3749227
## 5 0.3757253 0.1558633 0.8154368 0.07778942 0.3577418 0.5637030 0.5066815
## 6 0.6351672 0.9344551 0.6718389 0.13685002 0.8191125 0.6294466 0.6056130
## V94 V95 V96 V97 V98 V99 V100
## 1 0.74299622 0.9372414 0.2990555 0.3055707 0.851845863 0.9236000 0.3213596
## 2 0.63276982 0.4216996 0.5496787 0.6490230 0.239391752 0.6148555 0.1894100
## 3 0.76141462 0.1507744 0.9086228 0.9663571 0.014460469 0.9062310 0.7897107
## 4 0.36047587 0.5312079 0.9025777 0.2441219 0.005079025 0.8869425 0.1086141
## 5 0.59693755 0.5834280 0.4751858 0.1520234 0.920813971 0.2201902 0.4662787
## 6 0.07630926 0.7267419 0.7488450 0.2114582 0.908823544 0.4001999 0.3564068
## 763 Mb
Nearly a gigabyte. This still fits in RAM, but 10 million cases with 100 variables may not (depending on the system), and 100M almost certainly wouldn’t.
Let’s free that memory by purging the data from our environment
dplyr
and SQL codeInstead of reading the whole dataset into memory, we can often find SQL equivalents for the things that we might not be able to do with an SQL view using dplyr
code, and use it within dplyr
code to make a sort of hybrid command:
## # Source: lazy query [?? x 3]
## # Database: mysql 5.7.33-log
## # [@mdsr.cdc7tgkkqd0n.us-east-1.rds.amazonaws.com:/airlines]
## carrier name name_code
## <chr> <chr> <chr>
## 1 02Q Titan Airways Titan Airways(02Q)
## 2 04Q Tradewind Aviation Tradewind Aviation(04Q)
## 3 05Q Comlux Aviation, AG Comlux Aviation, AG(05Q)
## 4 06Q Master Top Linhas Aereas Ltd. Master Top Linhas Aereas Ltd.(06…
## 5 07Q Flair Airlines Ltd. Flair Airlines Ltd.(07Q)
## 6 09Q Swift Air, LLC Swift Air, LLC(09Q)
## 7 0BQ DCA DCA(0BQ)
## 8 0CQ ACM AIR CHARTER GmbH ACM AIR CHARTER GmbH(0CQ)
## 9 0GQ Inter Island Airways, d/b/a I… Inter Island Airways, d/b/a Inte…
## 10 0HQ Polar Airlines de Mexico d/b/… Polar Airlines de Mexico d/b/a N…
## # … with more rows
Notice that CONCAT is not a valid R function
The scidb
database also has a bunch of data from IMDB.
## [1] "aka_name" "aka_title" "cast_info"
## [4] "char_name" "comp_cast_type" "company_name"
## [7] "company_type" "complete_cast" "info_type"
## [10] "keyword" "kind_type" "link_type"
## [13] "movie_companies" "movie_info" "movie_info_idx"
## [16] "movie_keyword" "movie_link" "name"
## [19] "person_info" "role_type" "title"
If we know the SQL query we want, we can make an R string from it and sent it do the database with dbGetQuery()
. The following query returns a list of the different kinds of “entities” that the imdb database has information about (represented in the kind_type
table)
## id kind
## 1 1 movie
## 2 2 tv series
## 3 3 tv movie
## 4 4 video movie
## 5 5 tv mini series
## 6 6 video game
## 7 7 episode
This one gives us the table that lists the kinds of ‘properties’ of these entities that the data includes.
## id info
## 1 1 runtimes
## 2 2 color info
## 3 3 genres
## 4 4 languages
## 5 5 certificates
## 6 6 sound mix
## 7 7 tech info
## 8 8 countries
## 9 9 taglines
## 10 10 keywords
## 11 11 alternate versions
## 12 12 crazy credits
## 13 13 goofs
## 14 14 soundtrack
## 15 15 quotes
## 16 16 release dates
## 17 17 trivia
## 18 18 locations
## 19 19 mini biography
## 20 20 birth notes
## 21 21 birth date
## 22 22 height
## 23 23 death date
## 24 24 spouse
## 25 25 other works
## 26 26 birth name
## 27 27 salary history
## 28 28 nick names
## 29 29 books
## 30 30 agent address
## 31 31 biographical movies
## 32 32 portrayed in
## 33 33 where now
## 34 34 trade mark
## 35 35 interviews
## 36 36 article
## 37 37 magazine cover photo
## 38 38 pictorial
## 39 39 death notes
## 40 40 LD disc format
## 41 41 LD year
## 42 42 LD digital sound
## 43 43 LD official retail price
## 44 44 LD frequency response
## 45 45 LD pressing plant
## 46 46 LD length
## 47 47 LD language
## 48 48 LD review
## 49 49 LD spaciality
## 50 50 LD release date
## 51 51 LD production country
## 52 52 LD contrast
## 53 53 LD color rendition
## 54 54 LD picture format
## 55 55 LD video noise
## 56 56 LD video artifacts
## 57 57 LD release country
## 58 58 LD sharpness
## 59 59 LD dynamic range
## 60 60 LD audio noise
## 61 61 LD color information
## 62 62 LD group genre
## 63 63 LD quality program
## 64 64 LD close captions-teletext-ld-g
## 65 65 LD category
## 66 66 LD analog left
## 67 67 LD certification
## 68 68 LD audio quality
## 69 69 LD video quality
## 70 70 LD aspect ratio
## 71 71 LD analog right
## 72 72 LD additional information
## 73 73 LD number of chapter stops
## 74 74 LD dialogue intellegibility
## 75 75 LD disc size
## 76 76 LD master format
## 77 77 LD subtitles
## 78 78 LD status of availablility
## 79 79 LD quality of source
## 80 80 LD number of sides
## 81 81 LD video standard
## 82 82 LD supplement
## 83 83 LD original title
## 84 84 LD sound encoding
## 85 85 LD number
## 86 86 LD label
## 87 87 LD catalog number
## 88 88 LD laserdisc title
## 89 89 screenplay-teleplay
## 90 90 novel
## 91 91 adaption
## 92 92 book
## 93 93 production process protocol
## 94 94 printed media reviews
## 95 95 essays
## 96 96 other literature
## 97 97 mpaa
## 98 98 plot
## 99 99 votes distribution
## 100 100 votes
## 101 101 rating
## 102 102 production dates
## 103 103 copyright holder
## 104 104 filming dates
## 105 105 budget
## 106 106 weekend gross
## 107 107 gross
## 108 108 opening weekend
## 109 109 rentals
## 110 110 admissions
## 111 111 studios
## 112 112 top 250 rank
## 113 113 bottom 10 rank
This one uses WHERE
, which is the SQL equivalent of filter()
.
(Note: title
is the name of both a table and a variable in that table, annoyingly)
(Note 2: since the whole query is quoted with double quotes, we need single quotes for quoted expressions within the query)
## id title imdb_index kind_id production_year
## 1 476823 The Empire Strikes Back <NA> 7 1980
## 2 461950 The Empire Strikes Back <NA> 7 2015
## 3 919851 The Empire Strikes Back <NA> 7 2015
## 4 924929 The Empire Strikes Back <NA> 7 2017
## 5 1751199 The Empire Strikes Back <NA> 7 2003
## 6 1890901 The Empire Strikes Back <NA> 7 2016
## 7 2530173 The Empire Strikes Back <NA> 7 2005
## 8 2780701 The Empire Strikes Back <NA> 7 1986
## 9 2916719 The Empire Strikes Back <NA> 7 2012
## 10 4346960 The Empire Strikes Back <NA> 6 1992
## imdb_id phonetic_code episode_of_id season_nr episode_nr series_years
## 1 NA E5162 476562 NA NA <NA>
## 2 NA E5162 461323 5 91 <NA>
## 3 NA E5162 919801 1 7 <NA>
## 4 NA E5162 924922 1 1 <NA>
## 5 NA E5162 1751194 1 2 <NA>
## 6 NA E5162 1890747 7 1 <NA>
## 7 NA E5162 2530161 1 12 <NA>
## 8 NA E5162 2780696 1 9 <NA>
## 9 NA E5162 2916717 1 3 <NA>
## 10 NA E5162 NA NA NA <NA>
## md5sum
## 1 ec3a52f2c4ad8513889060bfc28871ad
## 2 0fa9a1c1f9581128e420e19f51fe2689
## 3 0a40b8ea1fce567f98bc5c8be3d695a6
## 4 98f5d44fdec7f4bd441353ebc014850f
## 5 ba276f18d6d8e4fb7e13c82493e0af8a
## 6 639bde13d7843adfaab1ed154cef949a
## 7 4fc2a75a237ba38bdc71c1add6dae7ff
## 8 d7de5425004a87ef0adf739cc6a5ef59
## 9 430d9e1af0842d60e501e86de44e05f6
## 10 8f45ef75543e1ad1173b5d6bfc0d6c12
To give more than one filter condition, we can use AND
and/or OR
. This one should restrict entries to movies titled “The Empire Strikes Back”…
im_db %>%
dbGetQuery(
"SELECT *
FROM title
WHERE title = 'The Empire Strikes Back'
AND kind_id = 1;"
)
## [1] id title imdb_index kind_id
## [5] production_year imdb_id phonetic_code episode_of_id
## [9] season_nr episode_nr series_years md5sum
## <0 rows> (or 0-length row.names)
Oops, nothing matches. Any idea why?
We can use “wildcards” to search for titles that contain a phrase
im_db %>%
dbGetQuery(
"SELECT *
FROM title
WHERE title LIKE '%The Empire Strikes Back%'
AND kind_id = 1;"
)
## id title imdb_index
## 1 4260164 Star Wars: Episode V - The Empire Strikes Back <NA>
## 2 4346961 The Empire Strikes Back Part II (Parody) <NA>
## kind_id production_year imdb_id phonetic_code episode_of_id season_nr
## 1 1 1980 NA S3621 NA NA
## 2 1 2016 NA E5162 NA NA
## episode_nr series_years md5sum
## 1 NA <NA> 99c3b6266cd096147b1d7af2b8353edb
## 2 NA <NA> 759372cfa023e3f7485dfe0db5bdc13d
The following is equivalent to a select()
, filter()
and arrange()
sql_starwars_query <- "
SELECT t.title, t.production_year
FROM title t
WHERE t.title LIKE '%The Empire Strikes Back%'
AND t.kind_id = 1
ORDER BY production_year;
"
## title production_year
## 1 Star Wars: Episode V - The Empire Strikes Back 1980
## 2 The Empire Strikes Back Part II (Parody) 2016
This one involves a join, along with a select and filter to get the cast of The Empire Strikes Back (to specify it uniquely we use the id we found for it above)
im_db %>%
dbGetQuery(
"SELECT n.name, ci.role_id
FROM cast_info ci
JOIN name n ON n.id = ci.person_id
WHERE movie_id = 4260164;"
)
## name role_id
## 1 Anderson, Bob 1
## 2 Austen, Alan 1
## 3 Baker, Kenny 1
## 4 Bear, Lightning 1
## 5 Boa, Bruce 1
## 6 Bonehill, Richard 1
## 7 Bonehill, Richard 1
## 8 Bonehill, Richard 1
## 9 Bonehill, Richard 1
## 10 Buchanan, Stephen 1
## 11 Bulloch, Jeremy 1
## 12 Bunn, Chris 1
## 13 Bush, Morris 1
## 14 Butterfield, Trevor 1
## 15 Butterfield, Trevor 1
## 16 Cannon, John 1
## 17 Capri, Mark 1
## 18 Chancer, Norman 1
## 19 Clarkin, Tony 1
## 20 Colley, Kenneth 1
## 21 Culver, Michael 1
## 22 Curry, Shaun 1
## 23 Daniels, Anthony 1
## 24 Dew, Martin 1
## 25 Diamond, Peter 1
## 26 Dicks, John 1
## 27 Dowdall, Jim 1
## 28 Duff, Norwich 1
## 29 Durrant, Ian 1
## 30 Durrant, Ian 1
## 31 Edmonds, Mike 1
## 32 Egeland, Tom 1
## 33 Fell, Stuart 1
## 34 Flood, Tom 1
## 35 Flyng, Alan 1
## 36 Ford, Harrison 1
## 37 Frandy, Michael A. 1
## 38 Ginter, Patrick 1
## 39 Glover, Julian 1
## 40 Guinness, Alec 1
## 41 Hamill, Mark 1
## 42 Harris, Alan 1
## 43 Harris, Alan 1
## 44 Harte, Jerry 1
## 45 Hassett, Ray 1
## 46 Henry, Walter 1
## 47 Hollis, John 1
## 48 Jerricho, Paul 1
## 49 Johns, Milton 1
## 50 Johnston, Joe 1
## 51 Jones, James Earl 1
## 52 Jones, Mark 1
## 53 Juritzen, Arve 1
## 54 Klein, Paul 1
## 55 Lawson, Denis 1
## 56 Liston, Ian 1
## 57 Maguire, Oliver 1
## 58 Malcolm, Christopher 1
## 59 Mayhew, Peter 1
## 60 McDiarmid, Ian 1
## 61 McDonald, Mac 1
## 62 McKenzie, Jack 1
## 63 McQuarrie, Ralph 1
## 64 Meek, Steve 1
## 65 Meek, Steve 1
## 66 Meek, Steve 1
## 67 Meek, Steve 1
## 68 Morrison, Temuera 1
## 69 Morse, Ralph 1
## 70 Morse, Ralph 1
## 71 Morse, Ralph 1
## 72 Morton, John 1
## 73 Nelson, C. Andrew 1
## 74 Oldfield, Richard 1
## 75 Oz, Frank 1
## 76 Parsons, Chris 1
## 77 Parsons, Chris 1
## 78 Parsons, Chris 1
## 79 Pierre, Quentin 1
## 80 Prowse, David 1
## 81 Purvis, Jack 1
## 82 Ratzenberger, John 1
## 83 Revill, Clive 1
## 84 Richards, Terry 1
## 85 Robinson, Doug 1
## 86 Roy, Peter 1
## 87 Santiago, Michael 1
## 88 Scobey, Robin 1
## 89 Sheard, Michael 1
## 90 Sidoli, Richard 1
## 91 Smart, Tony 1
## 92 Swaden, Alan 1
## 93 Tucker, Burnell 1
## 94 Webb, Des 1
## 95 Weed, Harold 1
## 96 Williams, Billy Dee 1
## 97 Williams, Treat 1
## 98 Wingreen, Jason 1
## 99 Burleigh, Mercedes 2
## 100 Eaton, Marjorie 2
## 101 English, Stephanie 2
## 102 Fisher, Carrie 2
## 103 Hillkurtz, Tiffany 2
## 104 Hudson, Susie 2
## 105 Kahn, Brigitte 2
## 106 Munroe, Cathy 2
## 107 Munroe, Cathy 2
## 108 Turk, Marolyn 2
## 109 Bloom, Jim 3
## 110 Kazanjian, Howard G. 3
## 111 Kurtz, Gary 3
## 112 Lucas, George 3
## 113 McCallum, Rick 3
## 114 Watts, Robert 3
## 115 Brackett, Leigh 4
## 116 Kasdan, Lawrence 4
## 117 Lucas, George 4
## 118 Suschitzky, Peter 5
## 119 Williams, John 6
## 120 Mollo, John 7
## 121 Kershner, Irvin 8
## 122 Christopher, T.M. 9
## 123 Hirsch, Paul 9
## 124 Lucas, George 9
## 125 Lucas, Marcia 9
## 126 Alsup, Bunny 10
## 127 Arnold, Alan 10
## 128 Barclay, David Alan 10
## 129 Bay, Jane 10
## 130 Butcher, Ron 10
## 131 Calcutt, Stephen 10
## 132 Carey, Des 10
## 133 Coke, Liz 10
## 134 Cook, Ron 10
## 135 Dearlove, Jack 10
## 136 Farrar, Kathryn 10
## 137 Fox, Tim 10
## 138 Ginter, Patrick 10
## 139 Ginter, Patrick 10
## 140 Gordon, Ken 10
## 141 Harley, Barbara 10
## 142 Harris, Alan 10
## 143 Harris, James 10
## 144 Herman, Miki 10
## 145 Larkins, Michael 10
## 146 Laws, Nick 10
## 147 Mann, Pamela 10
## 148 Marshall, Mark 10
## 149 Meek, Steve 10
## 150 Midener, Wendy 10
## 151 Miller, Craig 10
## 152 Morton, John 10
## 153 Mullen, Kathryn 10
## 154 Parsons, Chris 10
## 155 Phipps, Ron 10
## 156 Rawlings, Kay 10
## 157 Roy, Deep 10
## 158 Shaw, Debbie 10
## 159 Wessler, Charles B. 10
## 160 Wing, Kristen 10
## 161 Wolff, Marc 10
## 162 Reynolds, Norman 11