To define facts in a dimensional_model
object, the essential data is a name
and a set of measurements that can be empty (does not have explicit
measurements). Associated with each measurement, an aggregation function is
required, which by default is SUM.
Usage
define_fact(
st,
name = NULL,
measures = NULL,
agg_functions = NULL,
nrow_agg = "nrow_agg"
)
# S3 method for dimensional_model
define_fact(
st,
name = NULL,
measures = NULL,
agg_functions = NULL,
nrow_agg = "nrow_agg"
)
Arguments
- st
A
dimensional_model
object.- name
A string, name of the fact.
- measures
A vector of measure names.
- agg_functions
A vector of aggregation function names. If none is indicated, the default is SUM. Additionally they can be MAX or MIN.
- nrow_agg
A string, measurement name for the number of rows aggregated.
Details
To get a star schema (a star_schema
object) we need a flat table
(implemented through a tibble
) and a dimensional_model
object. The
definition of facts in the dimensional_model
object is made from the flat
table column names. Using the dput
function we can list the column names of
the flat table so that we do not have to type their names.
Associated with each measurement there is an aggregation function that can be SUM, MAX or MIN. Mean is not considered among the possible aggregation functions: The reason is that calculating the mean by considering subsets of data does not necessarily yield the mean of the total data.
An additional measurement corresponding to the number of aggregated rows is always added which, together with SUM, allows us to obtain the mean if needed.
See also
Other star definition functions:
define_dimension()
,
dimensional_model()
Examples
# dput(colnames(mrs_age))
#
# c(
# "Reception Year",
# "Reception Week",
# "Reception Date",
# "Data Availability Year",
# "Data Availability Week",
# "Data Availability Date",
# "Year",
# "WEEK",
# "Week Ending Date",
# "REGION",
# "State",
# "City",
# "Age Range",
# "Deaths"
# )
dm <- dimensional_model() |>
define_fact(
name = "mrs_age",
measures = c("Deaths"),
agg_functions = c("SUM"),
nrow_agg = "nrow_agg"
)
dm <- dimensional_model() |>
define_fact(
name = "mrs_age",
measures = c("Deaths")
)
dm <- dimensional_model() |>
define_fact(name = "Factless fact")