Introduction
Once we developed a star database in R, we would like to exploit it directly in R to develop multidimensional queries, but that is part of a future project. Currently we need to export the result to other formats to be able to use other OLAP query tools.
This document shows the possibilities offered by the package in this
context: Export an object of the star_database
class to
other formats. After presenting the object that we will use as an
example and how to select it, a section is dedicated to each type of
element obtained: dm
, list of tibble
objects,
relational databases, xlsx and csv files, and
geomultistar::multistar
object. Finally, it finish with the
conclusions.
Example star_database
object
The variable mrs_db
, obtained in the vignette titled
Obtaining and transforming flat tables,
vignette("v05-flat-table-op")
, contains an object of class
star_database
that we will use in the examples.
class(mrs_db)
#> [1] "star_database"
We can see a representation of the tables it contains using the
draw_tables()
function, as shown below.
mrs_db |>
draw_tables()
Select only one star database
Strictly speaking, a star database is composed of a fact table and
several associated dimension tables. A constellation is made up of
several star databases that can share dimensions. In the
rolap
package they are treated in a unified way under the
star_database
class: It is used both to define star
databases and to define constellations.
It is possible that we are interested in exporting only a star
database of the constellation, or a subset of it (also a constellation).
For this reason, in this situation, the first step before performing the
export operation would be to select the star databases that interest us
from the constellation. This operation can be carried out using the
get_star_database()
function, where the names of the star
databases are indicated. The names can be obtained using the
get_fact_names()
function.
mrs_db |>
get_fact_names()
#> [1] "mrs_age" "mrs_cause"
Next, we select one of the star databases (also an object of class
star_database
) and display its tables.
mrs_cause <- mrs_db |>
get_star_database("mrs_cause")
class(mrs_cause)
#> [1] "star_database"
mrs_cause |>
draw_tables()
dm
class object
The dm
package
allows us to work in R with tables that correspond to others from
relational databases, both to import and export them. It also allows
them to be represented graphically (the graphical representations of the
tables presented in the previous section have been made using the
dm
package).
We can directly obtain an object of the dm
class from
the tables of our star databases using the as_dm_class
()
function.
mrs_dm <- mrs_db |>
as_dm_class()
class(mrs_dm)
#> [1] "dm"
mrs_dm
#> ── Metadata ────────────────────────────────────────────────────────────────────
#> Tables: `when`, `where`, `who`, `mrs_cause`, `mrs_age`
#> Columns: 25
#> Primary keys: 5
#> Foreign keys: 5
List of tibble
objects
We can generate a tibble
list from the component tables
or from the flat tables obtained from them.
From component tables
Using the as_tibble_list()
function, we get a
tibble
list with the dimension and fact tables, as shown
below.
tl <- mrs_db |>
as_tibble_list()
tl
#> $when
#> # A tibble: 1,966 × 4
#> when_key year week week_ending_date
#> <int> <chr> <chr> <chr>
#> 1 1 1962 " 1" 01/06/1962
#> 2 2 1962 " 2" 01/13/1962
#> 3 3 1962 " 5" 02/03/1962
#> 4 4 1962 " 6" 02/10/1962
#> 5 5 1962 " 8" 02/24/1962
#> 6 6 1962 " 9" 03/03/1962
#> 7 7 1962 "12" 03/24/1962
#> 8 8 1962 "14" 04/07/1962
#> 9 9 1962 "15" 04/14/1962
#> 10 10 1962 "18" 05/05/1962
#> # ℹ 1,956 more rows
#>
#> $where
#> # A tibble: 120 × 9
#> where_key region state city city_state status pop lat long
#> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 1 CT Bridgeport Bridgeport CT non-capi… " 1… 41.2 " -7…
#> 2 2 1 CT Hartford Hartford CT state ca… " 1… 41.8 " -7…
#> 3 3 1 CT New Haven New Haven CT non-capi… " 1… 41.3 " -7…
#> 4 4 1 CT Waterbury Waterbury CT non-capi… " 1… 41.6 " -7…
#> 5 5 1 MA Cambridge Cambridge MA non-capi… " 1… 42.4 " -7…
#> 6 6 1 MA Fall River Fall River MA non-capi… " … 41.7 " -7…
#> 7 7 1 MA Lowell Lowell MA non-capi… " 1… 42.6 " -7…
#> 8 8 1 MA Lynn Lynn MA non-capi… " … 42.5 " -7…
#> 9 9 1 MA New Bedford New Bedford MA non-capi… " … 41.7 " -7…
#> 10 10 1 MA Somerville Somerville MA non-capi… " … 42.4 " -7…
#> # ℹ 110 more rows
#>
#> $who
#> # A tibble: 5 × 2
#> who_key age
#> <int> <chr>
#> 1 1 1-24 years
#> 2 2 25-44 years
#> 3 3 45-64 years
#> 4 4 65+ years
#> 5 5 <1 year
#>
#> $mrs_cause
#> # A tibble: 3,342 × 5
#> when_key where_key pneumonia_and_influenza_deaths all_deaths nrow_agg
#> <int> <int> <int> <int> <int>
#> 1 1 87 2 69 1
#> 2 2 8 1 28 1
#> 3 2 10 2 21 1
#> 4 2 64 10 220 1
#> 5 2 97 2 12 1
#> 6 3 15 2 32 1
#> 7 4 6 0 31 1
#> 8 4 15 1 32 1
#> 9 4 119 2 51 1
#> 10 5 36 0 39 1
#> # ℹ 3,332 more rows
#>
#> $mrs_age
#> # A tibble: 16,565 × 5
#> when_key where_key who_key all_deaths nrow_agg
#> <int> <int> <int> <int> <int>
#> 1 1 87 1 1 1
#> 2 1 87 2 5 1
#> 3 1 87 3 17 1
#> 4 1 87 4 41 1
#> 5 1 87 5 5 1
#> 6 2 8 1 0 1
#> 7 2 8 2 2 1
#> 8 2 8 3 11 1
#> 9 2 8 4 14 1
#> 10 2 8 5 1 1
#> # ℹ 16,555 more rows
From flat tables
Using the as_single_tibble_list()
function, we also get
a tibble
list but in this case the fact and dimension
tables have been merged to form a flat table as shown below.
tl <- mrs_db |>
as_single_tibble_list()
tl
#> $mrs_cause
#> # A tibble: 3,342 × 14
#> year week week_ending_date region state city city_state status pop lat
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1962 " 1" 01/06/1962 7 OK Tulsa Tulsa OK non-c… " 3… 36.1
#> 2 1962 " 2" 01/13/1962 1 MA Lynn Lynn MA non-c… " … 42.5
#> 3 1962 " 2" 01/13/1962 1 MA Some… Somervill… non-c… " … 42.4
#> 4 1962 " 2" 01/13/1962 5 DC Wash… Washingto… non-c… " 5… 38.9
#> 5 1962 " 2" 01/13/1962 8 CO Colo… Colorado … non-c… " 3… 38.9
#> 6 1962 " 5" 02/03/1962 2 NJ Eliz… Elizabeth… non-c… " 1… 40.7
#> 7 1962 " 6" 02/10/1962 1 MA Fall… Fall Rive… non-c… " … 41.7
#> 8 1962 " 6" 02/10/1962 2 NJ Eliz… Elizabeth… non-c… " 1… 40.7
#> 9 1962 " 6" 02/10/1962 9 WA Spok… Spokane WA non-c… " 1… 47.7
#> 10 1962 " 8" 02/24/1962 3 IL Rock… Rockford … non-c… " 1… 42.3
#> # ℹ 3,332 more rows
#> # ℹ 4 more variables: long <chr>, pneumonia_and_influenza_deaths <int>,
#> # all_deaths <int>, nrow_agg <int>
#>
#> $mrs_age
#> # A tibble: 16,565 × 14
#> year week week_ending_date region state city city_state status pop lat
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1962 " 1" 01/06/1962 7 OK Tulsa Tulsa OK non-c… " 3… 36.1
#> 2 1962 " 1" 01/06/1962 7 OK Tulsa Tulsa OK non-c… " 3… 36.1
#> 3 1962 " 1" 01/06/1962 7 OK Tulsa Tulsa OK non-c… " 3… 36.1
#> 4 1962 " 1" 01/06/1962 7 OK Tulsa Tulsa OK non-c… " 3… 36.1
#> 5 1962 " 1" 01/06/1962 7 OK Tulsa Tulsa OK non-c… " 3… 36.1
#> 6 1962 " 2" 01/13/1962 1 MA Lynn Lynn MA non-c… " … 42.5
#> 7 1962 " 2" 01/13/1962 1 MA Lynn Lynn MA non-c… " … 42.5
#> 8 1962 " 2" 01/13/1962 1 MA Lynn Lynn MA non-c… " … 42.5
#> 9 1962 " 2" 01/13/1962 1 MA Lynn Lynn MA non-c… " … 42.5
#> 10 1962 " 2" 01/13/1962 1 MA Lynn Lynn MA non-c… " … 42.5
#> # ℹ 16,555 more rows
#> # ℹ 4 more variables: long <chr>, age <chr>, all_deaths <int>, nrow_agg <int>
Relational database
To export the component tables to a relational database, we can use
the as_rdb()
function. We have to pass it as a parameter a
connection to the database, which we manage.
con <- DBI::dbConnect(RSQLite::SQLite())
mrs_db |>
as_rdb(con)
DBI::dbListTables(con)
#> [1] "mrs_age" "mrs_cause" "when" "where" "who"
DBI::dbDisconnect(con)
Additionally, in the the vignette titled Deployment of star
databases with incremental refresh,
vignette("v50-deploy")
, it is described how to deploy a
star database in a relational database so that it is automatically
updated through periodic refresh operations.
xlsx file
Using the as_xlsx_file()
function, we get an
xslx file where each table is stored in a sheet. We have to
indicate the name of the file. For the example, we select a temporary
file.
f <- mrs_db |>
as_xlsx_file(file = tempfile())
f
#> [1] "/tmp/Rtmpu3gS29/file30c23f633213.xlsx"
As we can see, it is responsible for assigning the appropriate extension.
csv files
Using the as_csv_files()
function we can store each
table in a different csv file, in the indicated folder, as shown
below.
d <- mrs_db |>
as_csv_files(dir = tempdir())
list.files(d, pattern = "*.csv")
#> [1] "mrs_age.csv" "mrs_cause.csv" "when.csv" "where.csv"
#> [5] "who.csv"
geomultistar::multistar
object
Using the as_multistar()
function we can get a
geomultistar::multistar
object; with this object we can use
the query and integration functions with geographic information offered
by the geomultistar
package.
ms <- mrs_db |>
as_multistar()
class(ms)
#> [1] "multistar"