Generating the When Dimension based on date and time tables
2024-01-08
Source:vignettes/when.Rmd
when.Rmd
Introduction
The When Dimension plays a fundamental role in Multidimensional Systems, it allows us to express when the analysed focus of attention have occurred.
This dimension corresponds to the generically called “Time Dimension”, as named in Kimball (1996) and Adamson (2010). Later it has also been called “Date Dimension” (date-based dimension) and “Time Dimension” (time-of-day-based dimension) to express granularity (Kimball and Ross 2013). We prefer to call it “When Dimension” because granularity is not involved in that term.
Although conceptually the date and time can be represented together in a single dimension, in ROLAP (Relational On-Line Analytical Processing) star database systems, it is common to implement these concepts separately, sometimes in separate tables “to avoid a row count explosion in the date dimension” (Kimball and Ross 2013) by adding the time for each day.
In Kimball and Ross (2013) these dimensions are considered static dimensions: they are built at the beginning of the project for the period under consideration and are referenced when adding the instances to the fact table. As mentioned there: “Typically, these dimensions are built in an afternoon with a spreadsheet.”
The purpose of the when
package is to assist in the
implementation of the When Dimension in
Multidimensional Systems implemented on a ROLAP star database,
regardless of its granularity, with the following features:
It supports the generation of tables with the granularity needed (second, minute, hour, date, week, month, quarter, semester or year) according to the relational star schema design criteria (Kimball and Ross 2013).
We consider separate tables for time-of-day data and data obtained from date to avoid a row count explosion if a single table were used.
It can be used to generate tables statically, indicating a period, or dynamically, indicating data series.
In addition to attributes at the selected granularity, other attributes can be added at coarser granularity to facilitate querying. We can also add a surrogate key.
We can obtain the tables in
tibble
format and in other formats (xlsx, csv or Relational DBMS table).
It relies on the functions offered by the lubridate
package to obtain the components from the date and time.
The rest of this document is structured as follows: First, the general process of defining these dimensions is presented. Next, we present dimension-specific aspects that we can define, both for date-based and time-based dimensions. Finally, the document ends with conclusions and bibliography.
Definition process
The definition process consists of the following steps:
- Creating an object.
- Definition of general aspects.
- Definition of instances.
- Selection of levels and attributes.
- Generation of the result.
- Getting the dimension table.
Steps 2, 3 and 4 can be done in any order, they can even be included in step 1.
Below we discuss each of them.
Creating an object
The definition process begins by creating an object of the
when
class.
By default, an object is created based on date, but we can also
create it based on time by indicating it using the type
parameter.
w_time <- when(type = 'time')
Virtually all configuration options can be defined using the object creation function. They can also be defined later using specific functions, as shown in the following section.
Definition of general aspects
For example, using the following combination of functions we get the same result as defining the type when we create the object.
w_time_2 <- when() |>
define_characteristics(type = 'time')
identical(w_time, w_time_2)
#> [1] TRUE
By default it considers the system language. Since our system is Spanish, for the literals of day and month names to appear in English, we must configure it explicitly.
w_date <- w_date |>
define_characteristics(locale = Sys.setlocale("LC_TIME", "English"))
In section Other configuration options we present the rest of the configuration options for dimension generation. For the examples that we are going to use, the default options of the package are enough.
Next we have to select and configure the levels and define the instances. These operations, along with the current one, can be performed in any order.
Definition of instances
We can define the instances by giving a range of values or by indicating the specific instances. The values in both cases will depend on the type of dimension (date-based or time-based).
We can indicate these values at the time of creating the class object
or through the define_instances()
function.
In the case of date, we can indicate a year, year and month (with the format
yyyy-mm
), or year month and day (yyyy-mm-dd
).In the case of time, we can indicate an hour, hour and minute (with the format
hh:mm
), or hour, minute and second (hh:mm:ss
).
In both cases we can use the string format, the one provided by the
lubridate
package or, when it is a single number (year or hour), we can use an
integer value.
For example, below they are used with the format of
lubridate
.
w_date <- w_date |>
define_instances(start = lubridate::today(),
end = lubridate::today() + lubridate::years(5))
In the following definitions we configure the dimensions so that they consider only the indicated dates. The second will be appropriate if we want to obtain the dimension at the year level, because we are indicating the specific values to consider (not a period).
w_date_2_1 <-
when(
values = c(
"2023-12-31",
"2023-01-01",
"2022-12-31",
"2022-01-01",
"2021-12-31",
"2021-01-01"
)
)
w_date_2_2 <- w_date |>
define_instances(values = 2020:2030)
The following definition will be appropriate at any level of detail because we are giving the range of values to consider (dates between those two years).
w_date_3 <- w_date |>
define_instances(start = 2020, end = 2030)
When only the year is indicated, it is considered its first day. Therefore the previous definition is equivalent to the following one.
w_date_4 <- w_date |>
define_instances(start = "2020-01-01", end = "2030-01-01")
identical(w_date_3, w_date_4)
#> [1] TRUE
In the case of time, by default, it is considered at second level and all seconds of the day.
w_time_3 <- w_time |>
define_instances(start = "00:00:00", end = "23:59:59")
identical(w_time, w_time_3)
#> [1] TRUE
We can also indicate a period or a set of times according to the indicated criteria. In the following example, the two definitions are equivalent.
w_time_4 <- w_time |>
define_instances(start = 8, end = 17)
w_time_5 <- w_time |>
define_instances(start = "08:00:00", end = "17:00:00")
identical(w_time_4, w_time_5)
#> [1] TRUE
Selection of levels and attributes
In the definition, by default, the attributes that we have considered
to be most frequently used have been included. We can consult them for
each level using the get_level_attribute_names()
function
with the parameter selected = TRUE
. If we define the
parameter as selected = FALSE
(default option), it will
return all available attributes. We can indicate the name of the level
or, if we do not indicate any, it will obtain the attributes of all the
levels. We obtain the level names using the
get_level_names()
function. Below are examples of using
these functions for the date-based dimension.
w_date |>
get_level_attribute_names(selected = TRUE)
#> [1] "date" "month_day" "week_day" "day_name"
#> [5] "day_num_name" "year_week" "week" "year_month"
#> [9] "month" "month_name" "month_num_name" "year"
w_date |>
get_level_names()
#> [1] "day" "week" "month" "quarter" "semester" "year"
w_date |>
get_level_attribute_names(name = 'month', selected = TRUE)
#> [1] "year_month" "month" "month_name" "month_num_name"
w_date |>
get_level_attribute_names(name = 'month')
#> [1] "year_month" "month" "month_name" "month_num_name"
#> [5] "month_abbr" "month_num_abbr"
Also for the time-based dimension.
w_time |>
get_level_attribute_names()
#> [1] "time" "hour" "minute" "second" "day_part"
w_time |>
get_level_names()
#> [1] "time"
There is a function to configure each level. We can select or
deselect each attribute individually using the attribute name as a
boolean parameter of the function. We can also select or deselect all
attributes defined at the level using the special
include_all
and exclude_all
parameters: Once
all the attributes have been updated with these special parameters, in
the same function call, we can select or deselect the attributes we
want.
For example, if we do not want to include the name of the month, we
can configure it using the select_month_level()
function or
at the time of object creation with a parameter of the same name, as
shown below.
w_date_5 <- w_date |>
select_month_level(month_name = FALSE)
w_date_6 <- when(
start = lubridate::today(),
end = lubridate::today() + lubridate::years(5),
month_name = FALSE
)
identical(w_date_5, w_date_6)
#> [1] TRUE
w_date_5 |>
get_level_attribute_names(name = 'month', selected = TRUE)
#> [1] "year_month" "month" "month_num_name"
If we only want to include the name of the month, we can do it as follows.
w_date_7 <- w_date |>
select_month_level(exclude_all = TRUE, month_name = TRUE)
w_date_7 |>
get_level_attribute_names(name = 'month', selected = TRUE)
#> [1] "month_name"
There is available a similar function for each level: time, date, week, month, quarter, semester and year.
In the case of the date-based dimension levels, we can select or
deselect them directly using the select_date_levels()
function. It allows us to treat the levels globally in the same way that
we have been treating the attributes. For example, to exclude the month
level we can do the following.
w_date_8 <- w_date |>
select_date_levels(month_level = FALSE)
w_date_8 |>
get_level_attribute_names(name = 'month', selected = TRUE)
#> character(0)
We can also perform that same operation at object creation time, as shown below.
w_date_9 <- when(
start = lubridate::today(),
end = lubridate::today() + lubridate::years(5),
month_level = FALSE
)
In the case of a time-based dimension, as we have seen, we only consider one level called “time”, as shown again below.
w_time |>
get_level_names()
#> [1] "time"
We can configure the attributes it includes using the
select_time_level()
function. The attribute called “hour”
must always be included. If we do not include the “minute” attribute
then “second” will not be included either. Below are some examples of
definition and the result obtained.
w_time_6 <- w_time |>
select_time_level(exclude_all = TRUE)
w_time_6 |>
get_level_attribute_names(selected = TRUE)
#> [1] "hour"
w_time_7 <- w_time |>
select_time_level(minute = FALSE)
w_time_7 |>
get_level_attribute_names(selected = TRUE)
#> [1] "time" "hour" "day_part"
Generation of the result
Once the characteristics of the dimension are defined, we can generate its table.
Although we have not yet generated the dimension table, the following function returns the names of the attributes it will contain. They can be obtained as a string or as a vector of strings.
w_date |>
get_table_attribute_names(as_string = FALSE)
#> [1] "id" "date" "month_day" "week_day"
#> [5] "day_name" "day_num_name" "year_week" "week"
#> [9] "year_month" "month" "month_name" "month_num_name"
#> [13] "year"
If it is generated in the form of a string
(as_string = TRUE
, default option), the result can be used
to easily change the name of the attributes, when the table is
generated, using the set_table_attribute_names()
function,
if deemed necessary, as shown in section Renaming
the attributes.
When we consider that the dimension is already properly defined, we
can generate its corresponding table using the
generate_table()
function. We update the definition of the
object. Any errors found will be notified to us.
w_date <- w_date |>
generate_table()
w_time <- w_time |>
generate_table()
Once the table is generated, we can export it. If we want to access the table and it is not yet generated, it will send us an error message.
If we make changes to the configuration, it is our responsibility to generate the table again.
Getting the dimension table
We can access the dimension table to work with it in R (for example
using the rolap
package), or export it in the format that is most convenient for us to
work with other tools.
Next we access the generated tables and display them. First of all for the date.
t_date <- w_date |>
get_table()
The first and last rows of the obtained result are shown below.
pander::pandoc.table(rbind(head(t_date, 5), tail(t_date, 5)),
split.table = Inf)
id | date | month_day | week_day | day_name | day_num_name | year_week | week | year_month | month | month_name | month_num_name | year |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2024-01-09 | 09 | 2 | Tuesday | 2-Tuesday | 2024-02 | 02 | 2024-01 | 01 | January | 01-January | 2024 |
2 | 2024-01-10 | 10 | 3 | Wednesday | 3-Wednesday | 2024-02 | 02 | 2024-01 | 01 | January | 01-January | 2024 |
3 | 2024-01-11 | 11 | 4 | Thursday | 4-Thursday | 2024-02 | 02 | 2024-01 | 01 | January | 01-January | 2024 |
4 | 2024-01-12 | 12 | 5 | Friday | 5-Friday | 2024-02 | 02 | 2024-01 | 01 | January | 01-January | 2024 |
5 | 2024-01-13 | 13 | 6 | Saturday | 6-Saturday | 2024-02 | 02 | 2024-01 | 01 | January | 01-January | 2024 |
1824 | 2029-01-05 | 05 | 5 | Friday | 5-Friday | 2029-01 | 01 | 2029-01 | 01 | January | 01-January | 2029 |
1825 | 2029-01-06 | 06 | 6 | Saturday | 6-Saturday | 2029-01 | 01 | 2029-01 | 01 | January | 01-January | 2029 |
1826 | 2029-01-07 | 07 | 7 | Sunday | 7-Sunday | 2029-01 | 01 | 2029-01 | 01 | January | 01-January | 2029 |
1827 | 2029-01-08 | 08 | 1 | Monday | 1-Monday | 2029-02 | 02 | 2029-01 | 01 | January | 01-January | 2029 |
1828 | 2029-01-09 | 09 | 2 | Tuesday | 2-Tuesday | 2029-02 | 02 | 2029-01 | 01 | January | 01-January | 2029 |
Let’s generate the dimension at the week level including all its attributes (if we change the definition, we have to generate the table again).
t_date <- w_date |>
select_date_levels(day_level = FALSE) |>
select_week_level(include_all = TRUE) |>
generate_table() |>
get_table()
The first and last rows of the obtained result are shown below.
pander::pandoc.table(rbind(head(t_date, 5), tail(t_date, 5)),
split.table = Inf)
id | year_week | week | year_month | month | month_name | month_num_name | year |
---|---|---|---|---|---|---|---|
1 | 2024-02 | 02 | 2024-01 | 01 | January | 01-January | 2024 |
2 | 2024-03 | 03 | 2024-01 | 01 | January | 01-January | 2024 |
3 | 2024-04 | 04 | 2024-01 | 01 | January | 01-January | 2024 |
4 | 2024-05 | 05 | 2024-01 | 01 | January | 01-January | 2024 |
5 | 2024-05 | 05 | 2024-02 | 02 | February | 02-February | 2024 |
310 | 2028-51 | 51 | 2028-12 | 12 | December | 12-December | 2028 |
311 | 2028-52 | 52 | 2028-12 | 12 | December | 12-December | 2028 |
312 | 2028-53 | 53 | 2028-12 | 12 | December | 12-December | 2028 |
313 | 2029-01 | 01 | 2029-01 | 01 | January | 01-January | 2029 |
314 | 2029-02 | 02 | 2029-01 | 01 | January | 01-January | 2029 |
Next for the time.
t_time <- w_time |>
get_table()
The start and end of the result table is shown below.
pander::pandoc.table(rbind(head(t_time, 5), tail(t_time, 5)),
split.table = Inf)
id | time | hour | minute | second | day_part |
---|---|---|---|---|---|
1 | 00:00:00 | 00 | 00 | 00 | Night |
2 | 00:00:01 | 00 | 00 | 01 | Night |
3 | 00:00:02 | 00 | 00 | 02 | Night |
4 | 00:00:03 | 00 | 00 | 03 | Night |
5 | 00:00:04 | 00 | 00 | 04 | Night |
86396 | 23:59:55 | 23 | 59 | 55 | Night |
86397 | 23:59:56 | 23 | 59 | 56 | Night |
86398 | 23:59:57 | 23 | 59 | 57 | Night |
86399 | 23:59:58 | 23 | 59 | 58 | Night |
86400 | 23:59:59 | 23 | 59 | 59 | Night |
day_part
is predefined in English. Literals or
associated hours can be accessed and changed using the
get_day_part()
and set_day_part()
functions,
as shown in section Names of the parts of the
day.
Now we are going to generate it at the minute level.
t_time <- w_time |>
select_time_level(second = FALSE) |>
generate_table() |>
get_table()
The start and end from the new result table is shown below.
pander::pandoc.table(rbind(head(t_time, 5), tail(t_time, 5)),
split.table = Inf)
id | time | hour | minute | day_part |
---|---|---|---|---|
1 | 00:00:00 | 00 | 00 | Night |
2 | 00:01:00 | 00 | 01 | Night |
3 | 00:02:00 | 00 | 02 | Night |
4 | 00:03:00 | 00 | 03 | Night |
5 | 00:04:00 | 00 | 04 | Night |
1436 | 23:55:00 | 23 | 55 | Night |
1437 | 23:56:00 | 23 | 56 | Night |
1438 | 23:57:00 | 23 | 57 | Night |
1439 | 23:58:00 | 23 | 58 | Night |
1440 | 23:59:00 | 23 | 59 | Night |
We can take the generated tables to any Relational DBMS, as shown below, we just have to pass a connection to the database as a parameter.
my_db <- DBI::dbConnect(RSQLite::SQLite())
w_date |>
get_table_rdb(my_db)
w_time |>
get_table_rdb(my_db)
DBI::dbListTables(my_db)
#> [1] "date" "time"
DBI::dbDisconnect(my_db)
In the previous example, since we have not explicitly defined any names for the tables in the configuration, they have been assigned as a name the element on which they are based (date or time). We can define these names, as shown in section Table name.
Functions to get the tables in xlsx and csv format
are also available: get_table_xlsx()
and
get_table_csv()
.
Other configuration options
In section Definition of general aspects we have presented the necessary configuration options for the examples we have used. In particular, we have only needed to define the type on which the dimension is based (time or date) and the language for naming days and months (because it is different from the language of our operating system). The default options have been adequate for the examples shown.
In this section we are going to show other configuration options that may be necessary to define our dimensions.
Dimension table features
The result of a definition operation performed with the
where
package is a table of a dimension at a given level of
granularity, with the attributes selected for that level or associated
coarser levels. So far we have focused on level attributes and table
instances. In this section we are going to configure other
characteristics of it.
Table name
By default, the table is assigned the name of the base component used to generate the dimension: date or time. We have seen it in the tables that we have stored in the database, in the example of the section Getting the dimension table.
We can define the name of the dimension at the time of object
creation or through the define_characteristics()
function,
as shown below.
wd_1 <- when(name = 'dim_where')
wd_2 <- when() |>
define_characteristics(name = 'dim_where')
This name is the one that will be assigned to the database table where we store it or to the files we obtain. Below is the case for a relational database.
my_db <- DBI::dbConnect(RSQLite::SQLite())
wd_1 |>
generate_table() |>
get_table_rdb(my_db)
DBI::dbListTables(my_db)
#> [1] "dim_where"
DBI::dbDisconnect(my_db)
Also the case of obtaining it in csv format (if we do not indicate a location, it stores it in one by default).
wd_2 |>
generate_table() |>
get_table_csv()
#> [1] "/tmp/RtmpniALsX/dim_where.csv"
Surrogate key
By default the dimension table will include a surrogate key. In some situations we may be interested in the table having the date itself as its primary key (for example if we want to partition the fact table by the date).
Through the surrogate_key
parameter, included in the
define_characteristics()
function and also the
when()
function, we can indicate that a surrogate key is
not created, as it’s shown in the following.
when() |>
get_table_attribute_names(as_string = FALSE)
#> [1] "id" "date" "month_day" "week_day"
#> [5] "day_name" "day_num_name" "year_week" "week"
#> [9] "year_month" "month" "month_name" "month_num_name"
#> [13] "year"
when(surrogate_key = FALSE) |>
get_table_attribute_names(as_string = FALSE)
#> [1] "date" "month_day" "week_day" "day_name"
#> [5] "day_num_name" "year_week" "week" "year_month"
#> [9] "month" "month_name" "month_num_name" "year"
The name of the surrogate key is id
, is the first
attribute of the table. If necessary, we can rename the attributes,
including the surrogate key as shown in the next section.
Renaming the attributes
If we get a table in tibble
format, we can rename its
attributes using the functions of this class. But if we want to obtain
it in a file or in a database, we may be interested in renaming its
attributes before obtaining it.
In order to rename the attributes, we first have to generate the table.
wd_3 <- when() |>
generate_table()
We can rename them using the set_table_attribute_names()
function, passing a name for each of the available attributes. To
facilitate the operation, using the
get_table_attribute_names()
function, we can obtain the
attributes in string format. With this we only have to replace the names
that we want to change, as shown below.
wd_3 |>
get_table_attribute_names()
#> [1] "c('id', 'date', 'month_day', 'week_day', 'day_name', 'day_num_name', 'year_week', 'week', 'year_month', 'month', 'month_name', 'month_num_name', 'year')"
wd_3 <- wd_3 |>
set_table_attribute_names(
c(
'id_when',
'date',
'month_day',
'week_day',
'day_name',
'day_num_name',
'year_week',
'week',
'year_month',
'month',
'month_name',
'month_num_name',
'year'
)
)
wd_3 |>
get_table_attribute_names(as_string = FALSE)
#> [1] "id_when" "date" "month_day" "week_day"
#> [5] "day_name" "day_num_name" "year_week" "week"
#> [9] "year_month" "month" "month_name" "month_num_name"
#> [13] "year"
Names of the parts of the day
We have associated each hour with the usual part of the day in
English. We can consult them and change them if necessary (for example,
to define them in another language) using the
get_day_part()
and set_day_part()
functions.
Below is an example where the definition is modified to consider only two parts in the day.
when() |>
get_day_part()
#> 00 01 02 03 04 05
#> "Night" "Night" "Night" "Night" "Night" "Morning"
#> 06 07 08 09 10 11
#> "Morning" "Morning" "Morning" "Morning" "Morning" "Morning"
#> 12 13 14 15 16 17
#> "Afternoon" "Afternoon" "Afternoon" "Afternoon" "Afternoon" "Evening"
#> 18 19 20 21 22 23
#> "Evening" "Evening" "Evening" "Night" "Night" "Night"
when() |>
set_day_part(hour = c(20:23, 0:5), name = "Night") |>
set_day_part(hour = c(6:19), name = "Day") |>
get_day_part()
#> 00 01 02 03 04 05 06 07 08 09
#> "Night" "Night" "Night" "Night" "Night" "Night" "Day" "Day" "Day" "Day"
#> 10 11 12 13 14 15 16 17 18 19
#> "Day" "Day" "Day" "Day" "Day" "Day" "Day" "Day" "Day" "Day"
#> 20 21 22 23
#> "Night" "Night" "Night" "Night"
Configuration of day and week levels
This section shows the configuration options available for the day and week levels.
The first day of the week
For days, in addition to the language of the name of the day of the week, we can define the start day of the week. There are two possibilities: Monday or Sunday. This determines the numbering of the days.
We can define it using the parameter week_starts_monday
(which by default has the value TRUE) in the functions
define_characteristics()
or when()
, as shown
below.
wd_1 <- when(week_starts_monday = FALSE)
wd_2 <- when() |>
define_characteristics(week_starts_monday = FALSE)
Numbering of weeks in the year
Using the value of the type
parameter
(type = "date"
or type = "time"
) in the
define_characteristics()
or when()
functions
we can indicate whether the dimension is based on date or time.
This parameter can additionally take two more values: “iso” and “epi”. Using these values (together with the “date” value) we can indicate the type of numbering of the weeks in the year. That is, the week number associated with each date depends on the type of date dimension selected:
Default (
type = "date"
): It numbers blocks of 7 days beginning on January 1 (regardless of what day of the week it is). The last week of the year will last less than 7 days.ISO 8601 (
type = "iso"
): It numbers blocks of 7 days from Monday to Sunday. The first and last week of the year can contain days from the previous or next year.Epidemiological (
type = "epi"
): This week is like ISO 8601 only that it considers that the week begins on Sunday.
For the first and last days of the year, the year associated with the week may be different from the year of the date, depending on the type selected (“iso” or “epi”).
The “year” attribute (alone or in combination with other attributes) displays the year associated with the date, regardless of the type selected. Only the “year_week” attribute displays the year of the week according to the selected type.
The next section shows how to obtain the attribute obtaining function and how to change it if we consider it necessary (for example, to get the different year types together).
Get date range of weeks
If we want to work with the dimension at the week level, we may be
interested in having the start or end date for each week. To obtain them
and be able to use them to generate the dimension, we have the
get_week_date_range()
function.
t <- get_week_date_range(start = "2024-01-01", end = "2029-12-31")
A period and the type of week numbering (“date”, “iso” or “epi”) are
indicated (by default it is “date”). Returns the result in
tibble
form.
The start and end from the new result is shown below.
pander::pandoc.table(rbind(head(t, 5), tail(t, 5)),
split.table = Inf)
year_week | first | last |
---|---|---|
2024-01 | 2024-01-01 | 2024-01-07 |
2024-02 | 2024-01-08 | 2024-01-14 |
2024-03 | 2024-01-15 | 2024-01-21 |
2024-04 | 2024-01-22 | 2024-01-28 |
2024-05 | 2024-01-29 | 2024-02-04 |
2029-49 | 2029-12-03 | 2029-12-09 |
2029-50 | 2029-12-10 | 2029-12-16 |
2029-51 | 2029-12-17 | 2029-12-23 |
2029-52 | 2029-12-24 | 2029-12-30 |
2029-53 | 2029-12-31 | 2029-12-31 |
We can use the column of the dates that interest us to generate the dimension. Additionally, we leave only the date and the levels that interest us (week and year).
tw <- when(values = t[["last"]], month_level = FALSE) |>
select_day_level(exclude_all = TRUE, date = TRUE) |>
generate_table() |>
get_table()
The start and end from the new result table is shown below.
pander::pandoc.table(rbind(head(tw, 5), tail(tw, 5)),
split.table = Inf)
id | date | year_week | week | year |
---|---|---|---|---|
1 | 2024-01-07 | 2024-01 | 01 | 2024 |
2 | 2024-01-14 | 2024-02 | 02 | 2024 |
3 | 2024-01-21 | 2024-03 | 03 | 2024 |
4 | 2024-01-28 | 2024-04 | 04 | 2024 |
5 | 2024-02-04 | 2024-05 | 05 | 2024 |
314 | 2029-12-09 | 2029-49 | 49 | 2029 |
315 | 2029-12-16 | 2029-50 | 50 | 2029 |
316 | 2029-12-23 | 2029-51 | 51 | 2029 |
317 | 2029-12-30 | 2029-52 | 52 | 2029 |
318 | 2029-12-31 | 2029-53 | 53 | 2029 |
Change the definition function of an attribute
Each attribute is defined by a specific function. All definition functions have the same structure: they receive a table, a column with the date or time values, possibly additional configuration parameters, and return the table to which a new column with the attribute name and values has been added, calculated from the values.
We can get the definition function of any attribute using the
get_attribute_definition_function()
function. Below is the
function for the “year” attribute and also the function of the year and
week combination, “year_week” attribute (it uses one of the additional
parameters available, the dimension type).
wd <- when()
wd |>
get_attribute_definition_function(name = "year")
#> function (table, values, ...)
#> {
#> table[["year"]] <- as.character(lubridate::year(values))
#> table
#> }
#> <bytecode: 0x558e7f304a88>
#> <environment: namespace:when>
wd |>
get_attribute_definition_function(name = "year_week")
#> function (table, values, ...)
#> {
#> dots <- list(...)
#> type <- dots[["type"]]
#> switch(type, iso = {
#> year <- lubridate::isoyear(values)
#> week <- lubridate::isoweek(values)
#> }, epi = {
#> year <- lubridate::epiyear(values)
#> week <- lubridate::epiweek(values)
#> }, {
#> year <- lubridate::year(values)
#> week <- lubridate::week(values)
#> })
#> table[["year_week"]] <- paste0(year, "-", sprintf("%02d",
#> week))
#> table
#> }
#> <bytecode: 0x558e7f2bede8>
#> <environment: namespace:when>
In view of the functions, we can define one with the same structure
and establish it as a new attribute definition function using the
set_attribute_definition_function()
function. For example,
below shows how to change the year attribute definition function to get
an additional attribute if the year associated with the week can be
different from the year of the date.
f <- function(table, values, ...) {
dots <- list(...)
type <- dots[['type']]
table[['year']] <- as.character(lubridate::year(values))
if (type == 'iso') {
table[['week_year']] <- as.character(lubridate::isoyear(values))
} else if (type == 'epi') {
table[['week_year']] <- as.character(lubridate::epiyear(values))
}
table
}
wd <- wd |>
set_attribute_definition_function(name = "year", f)
In this case, what we do is add two columns for the year: one with the year of the date and another with the year of the week if this can be different from the first. We can see the result by changing the type of the dimension and generating the table, as shown below (the two year attributes have been added).
t <- wd |>
define_characteristics(type = 'iso') |>
generate_table() |>
get_table()
names(t)
#> [1] "id" "date" "month_day" "week_day"
#> [5] "day_name" "day_num_name" "year_week" "week"
#> [9] "year_month" "month" "month_name" "month_num_name"
#> [13] "year" "week_year"
Conclusions
The when
package offers a set of operations that allow
us to generate the When Dimension based on date or time at the
level of detail we need.
We create an object of the when
class and, from it, we
configure the instances and characteristics of the levels and attributes
of the dimension. To define the instances we can indicate a range of
values or provide the specific values from which to generate the rest of
the attributes.
As a result, we can obtain tables for this dimension to work from R,
for example with the rolap
package, or work in other environments, including Relational DBMSs.