0.1 Rationale

Project 1: Reliability Analyses

Objective: Work Package 2 (WP2). Validation and measurement invariance across countries: We will examine quality, validity, and reliability of online abuse and reporting measures across various countries and different child demographics in the Disrupting Harms dataset. This will result in an evaluation report of OCSEA measures and we will share analytical code for the analysis of our measurement tools. The results will be included in a forthcoming best-practice toolbox to measure OCSEA, which will be continuously updated by the project teams at Cambridge, LSE and UNICEF until the end of the Disrupting Harm project in 2025.

Method: We will use item response theory and factor analytic approaches to assess whether the measurement of latent constructs is reliable across different countries. This analysis will identify which of the currently deployed global OCSEA measurement tools are reliable and can be recommended for use by other researchers across diverse cultural contexts. WP2 will result in a report detailing the quality of various measures, serving as valuable input for forthcoming research endeavours. Moreover, we will share our materials with fellow researchers to seamlessly employ our OCSEA measurements alongside the associated code for tasks such as data cleansing, processing, and analysis. Notably, they will accelerate the analysis of forthcoming studies on OCSEA such as UNICEF’s ongoing Disrupting Harm project which is currently in-field with another 12 national surveys in LMICs.

Output: Minimum 5 page report detailing the different measures across national datasets (draft already written by Sebastian) and reliability analyses of the OCSEA questions in the Disporting Harms dataset.

0.2 Preparation

0.2.1 Loading packages

Package environment fn load_packages uses a bespoke-written installer by MR.

## ------------------ init environment ------------------ ##

# load custom functions
source("src/fn/f_misc.R") # misc functions

# load libraries and dependencies
pkg <- c(
  'tidyverse',
  'haven',
  'psych',
  'lavaan',
  'semTools',
  'semPlot',
  'mirt',
  'labelled',
  'fabricatr',
  'splitstackshape',
  'openxlsx',
  'sjlabelled',
  'kableExtra',
  'ggsci',
  'table1',
  'ggcorrplot',
  'DT',
  'data.table',
  'sjPlot',
  'table1',
  'naniar', # missing data vis
  'caret',
  'matrixcalc', # matrix operations (check is positive definite)
  'mice', # multiple imputation 
  'mirt', # item response theory
  'ggVennDiagram', # venn diagram
  'UpSetR'
)

load_packages(pkg)

# load study specific functions
source("src/fn/f_getdata.R") # fns to unicef data

Also setting the ggplot2 theme.

theme_set(papaja::theme_apa())

0.2.2 Importing data

# import data using preprocessing function
data_raw <- get_unicef_data()

abuse_item_ref <- get_unicef_abuse_ref()

0.2.3 Digital Health Questionnaire

Selecting relevant DH items:

  • F6. In the past year, how often have any of these things happened to you? - I have seen sexual images or videos online because I wanted to [1-5]

  • F6. In the past year, how often have any of these things happened to you? - I have seen sexual images or videos online when I wasn’t expecting it [1-5]

  • G1a. In the past year, how often have these things happened to you? - Someone made sexual comments about me [1-5]

  • G1c. In the past year, how often have these things happened to you? - Someone sent me sexual images I did not want [1-5]

  • G3b. In the past year, how often have these things happened to you? - I have been asked to talk about sex or sexual acts with someone when I did not want to [1-5]

  • G3c. In the past year, how often have these things happened to you? - I have been asked by someone to do something sexual when I did not want to [1-5]

  • G3d. In the past year, how often have these things happened to you? - I have been asked for a photo or video showing my private parts [translate as appropriate] when I did not want to [1-5]

  • G4a. In the past year, how often have these things happened to you? - Someone offered me money or gifts in return for sexual images or videos [1-5]

  • G4b. In the past year, how often have these things happened to you? - Someone offered me money or gifts to meet them in person to do something sexual [1-5]

  • G4c. In the past year, how often have these things happened to you? - Someone shared sexual images of me without my consent [1-5]

  • G4d. In the past year, how often have these things happened to you? - Someone threatened or blackmailed me to engage in sexual activities [1-5]

# subset data to include relevant vars
data_subset <- data_raw |>
  dplyr::select("Respondent_ID", "COUNTRY", "age", 
                "sex", "presex", "mendecide", 
                "urbanrural", 
                "malerep", "violence", "helpseeking",
                "digitalskills",
                # relevant g1a... g4d items
                contains("abuse_"),
                "wgt_gross", "wgt_scaled"
                ) |>
  dplyr::distinct() |>
  # add id 
  dplyr::mutate(id_rnum=row_number())

# replace with NAs
data_subset[data_subset==888 | data_subset==999] <- NA

# check for duplicate ids
# t <- check_duplicates(data_subset)

0.2.4 Descriptives

# create labels
label(data_subset$sex) <- "Sex"
label(data_subset$age) <-"Age (in years)"
label(data_subset$urbanrural) <-"Urbanicity"

data_subset$age <- as.numeric(data_subset$age)

# render table
table1::table1( ~ sex + age + urbanrural | COUNTRY,
  data = data_subset,
  droplevels = F
)
Cambodia
(N=1007)
Ethiopia
(N=1021)
Indonesia
(N=1012)
Kenya
(N=1016)
Malaysia
(N=1009)
Mozambique
(N=1111)
Namibia
(N=1000)
Philippines
(N=975)
Tanzania
(N=1000)
Thailand
(N=1000)
Uganda
(N=1020)
Vietnam
(N=998)
Overall
(N=12169)
Sex
Boy 512 (50.8%) 677 (66.3%) 457 (45.2%) 476 (46.9%) 530 (52.5%) 611 (55.0%) 499 (49.9%) 436 (44.7%) 575 (57.5%) 426 (42.6%) 595 (58.3%) 478 (47.9%) 6272 (51.5%)
Girl 495 (49.2%) 344 (33.7%) 555 (54.8%) 540 (53.1%) 479 (47.5%) 500 (45.0%) 501 (50.1%) 539 (55.3%) 425 (42.5%) 574 (57.4%) 425 (41.7%) 520 (52.1%) 5897 (48.5%)
age
Mean (SD) 14.4 (1.68) 15.9 (1.41) 14.5 (1.71) 14.9 (1.72) 14.3 (1.69) 15.4 (1.57) 15.2 (1.66) 14.5 (1.69) 15.3 (1.59) 14.4 (1.59) 15.8 (1.47) 14.2 (1.65) 14.9 (1.72)
Median [Min, Max] 14.0 [12.0, 17.0] 16.0 [12.0, 17.0] 15.0 [12.0, 17.0] 15.0 [12.0, 17.0] 14.0 [12.0, 17.0] 16.0 [12.0, 17.0] 15.0 [12.0, 17.0] 14.0 [12.0, 17.0] 16.0 [12.0, 17.0] 14.0 [12.0, 17.0] 16.0 [12.0, 17.0] 14.0 [12.0, 17.0] 15.0 [12.0, 17.0]
Urbanicity
Peri urban 139 (13.8%) 191 (18.7%) 179 (17.7%) 28 (2.8%) 170 (16.8%) 70 (6.3%) 60 (6.0%) 75 (7.7%) 37 (3.7%) 94 (9.4%) 50 (4.9%) 86 (8.6%) 1179 (9.7%)
Rural 714 (70.9%) 605 (59.3%) 483 (47.7%) 652 (64.2%) 369 (36.6%) 700 (63.0%) 463 (46.3%) 454 (46.6%) 607 (60.7%) 533 (53.3%) 744 (72.9%) 649 (65.0%) 6973 (57.3%)
Urban 154 (15.3%) 225 (22.0%) 350 (34.6%) 336 (33.1%) 470 (46.6%) 341 (30.7%) 477 (47.7%) 446 (45.7%) 356 (35.6%) 373 (37.3%) 226 (22.2%) 263 (26.4%) 4017 (33.0%)

0.2.5 Plot OCSEA abuse variables

To understand the prevalence of OCSEA, we use the question (“In the past year, how often have these things happened to you”) to measure whether the child has experienced abuse or exploitation. In total, there are 9 sub-questions, each inquiring about a specific type of abuse. The answers to these sub-questions are originally captured on an ordinal scale ranging from 1 to 5:(1) Never (2) Rarely (3) Sometimes (4) Always (5) Often.

abuse_plt <- data_subset |>
  tidyr::pivot_longer(cols=starts_with("abuse_"),
                      names_to="var",
                      values_to="val")

# replace with na
abuse_plt$val <- replace(abuse_plt$val, abuse_plt$val %in% c(888,999), NA)

abuse_plt |>
  ggplot(aes(x=val)) +
  geom_bar(fill = "skyblue", color = "black") +
  facet_wrap(~var, scales="free_y") +
  labs(
    x="Likert Ordinal Scale Response (1-5)",
    y="Frequency",
    caption="Note. Ordinal scale responses range from 1 to 5:\n
    (1) Never (2) Rarely (3) Sometimes (4) Always (5) Often"
  ) +
  theme_minimal() +
  coord_flip()

0.2.6 Observe missingness

abuse_items <- data_subset |>
  dplyr::select(starts_with("abuse_"), id_rnum)
  

abuse_items[abuse_items==888 | abuse_items==999] <- NA

naniar::vis_miss(abuse_items)

naniar::gg_miss_var(abuse_items)

# count number of missing values by participant
abuse_items <- setDT(abuse_items)
abuse_items <- abuse_items[, na_count := rowSums(is.na(.SD))]

# Create the histogram
ggplot(abuse_items, aes(x = na_count)) +
  geom_histogram(binwidth = 0.5, fill = "skyblue", color = "black") +
  labs(
    title = "Frequency Histogram of Missing Values",
    x = "Number of missing values",
    y = "Frequency"
  ) +
  theme_minimal()

0.3 Imputation of missing values

Remove rows that contain high percentage of missingness (i.e., have more than 4-items missing).

threshold_count <- 4

# retain copy of rows with high missingness
abuse_items_rem <- abuse_items |>
  dplyr::filter(na_count > threshold_count)

# remove rows that contain high percentage of missingness
abuse_items_red <- abuse_items |> 
  dplyr::filter(na_count <= threshold_count)

print(paste('Removed n =', nrow(abuse_items_rem), 'items with high missingness (of n =',nrow(abuse_items),'). The final sample size is N =', nrow(abuse_items_red)))
## [1] "Removed n = 464 items with high missingness (of n = 12169 ). The final sample size is N = 11705"

0.4 Perform variable impuation

NB (4/9/24). Note that due to low levels of missingness across the cohort, it is decided to refrain from performing imputation at this stage.

abuse_items_fin <- abuse_items_red |>
  dplyr::select(-na_count)

# Perform multiple imputation with mice
imputed_data <- mice(abuse_items_fin, 
                     m = 5,
                     maxit = 50,
                     method = 'pmm',
                     seed = 123)

# View the imputed dataset
summary(imputed_data)

# Complete the dataset with imputed values
completed_data <- complete(imputed_data, action = "long")

# Display the completed dataset
print(completed_data)

stripplot(imputed_data)

1 Sample stratification

At this stage use listwise deletion for cases containing missing data on target questions (MR, 4/9/24).

abuse_df <- abuse_items |> 
  dplyr::filter(na_count < 1)

# obtain vector of included columns 
id_included <- abuse_df$id_rnum

abuse_df_full <- abuse_df |>
  dplyr::select(-na_count, id_rnum)

It is also necessary to stratify a random sample to ensure that EFA and CFA procedures are performed on a different sample. These strata include country and gender.

Convention states that the data should be split as follows for validation:

  • FACTOR STRUCTURE: 70% of the sample used for EFA
  • VALIDATION: 30% of the sample used for CFA
set.seed(seed_value) # set global seed

strat_df <- data_subset |>
  dplyr::filter(id_rnum %in% id_included) |>
  dplyr::select(id_rnum, COUNTRY, sex, contains("abuse_"))


strat_df$strata <- interaction(strat_df$COUNTRY, strat_df$sex)

# Perform a stratified random sampling split (70% for EFA, 30% for CFA)
trainIndex <- caret::createDataPartition(strat_df$strata, p = 0.7, list = FALSE)

# Split the data into EFA and CFA sets
efa_data <- strat_df[trainIndex, ]
cfa_data <- strat_df[-trainIndex, ]

Check that data split is representative

EFA data:

# Verify representativeness across 'country' and 'gender'
table(efa_data$COUNTRY, efa_data$sex)
##              
##               Boy Girl
##   Cambodia    273  272
##   Ethiopia    358  179
##   Indonesia   283  358
##   Kenya       301  337
##   Malaysia    337  311
##   Mozambique  342  279
##   Namibia     320  329
##   Philippines 234  318
##   Tanzania    369  262
##   Thailand    269  356
##   Uganda      360  255
##   Vietnam     316  341

CFA data:

table(cfa_data$COUNTRY, cfa_data$sex)
##              
##               Boy Girl
##   Cambodia    116  116
##   Ethiopia    153   76
##   Indonesia   120  153
##   Kenya       128  144
##   Malaysia    144  132
##   Mozambique  146  119
##   Namibia     137  141
##   Philippines  99  136
##   Tanzania    158  111
##   Thailand    115  152
##   Uganda      153  108
##   Vietnam     135  145

2 Exploratory Factor Analysis (EFA)

2.1 Optimise number of factors

  • Scree plot (factor number vs eigenvalues)
  • Parallel analysis
  • Very simple structure

Scree plot could make an argument for 2 or 3 factors.

Parallel analysis suggests 4 factors, but there is minimal difference in the eigenvalues for principal factors between n=2 and n=4 factors. An argumet can be made for 2.

Simple structure via maximum likelihood estimates factor loadings by rotating our analyses. It shows good fit for 2 factors and slightly improved fit for 3.

abuse_df <- efa_data |>
  dplyr::select(contains("abuse_"))

# scree plot
psych::scree(abuse_df, pc=F)

# very simple structure (VSS)
psych::VSS(abuse_df,
           fa = "fa", fm = "ml")

## 
## Very Simple Structure
## Call: vss(x = x, n = n, rotate = rotate, diagonal = diagonal, fm = fm, 
##     n.obs = n.obs, plot = plot, title = title, use = use, cor = cor, 
##     fa = "fa")
## VSS complexity 1 achieves a maximimum of 0.91  with  1  factors
## VSS complexity 2 achieves a maximimum of 0.93  with  2  factors
## 
## The Velicer MAP achieves a minimum of 0.03  with  1  factors 
## BIC achieves a minimum of  -8.72  with  4  factors
## Sample Size adjusted BIC achieves a minimum of  10.34  with  4  factors
## 
## Statistics by number of factors 
##   vss1 vss2   map dof   chisq     prob sqresid  fit RMSEA    BIC SABIC complex
## 1 0.91 0.00 0.031  27 2.8e+03  0.0e+00    2.55 0.91 0.118 2540.6  2626     1.0
## 2 0.54 0.93 0.047  19 9.8e+02 8.1e-195    2.00 0.93 0.083  806.5   867     1.6
## 3 0.46 0.80 0.073  12 5.2e+02 1.9e-103    1.60 0.94 0.076  412.3   450     1.9
## 4 0.35 0.66 0.149   6 4.5e+01  5.4e-08    1.37 0.95 0.030   -8.7    10     2.2
## 5 0.34 0.64 0.188   1 2.4e+01  1.1e-06    1.21 0.96 0.056   14.8    18     2.2
## 6 0.33 0.63 0.280  -3 2.8e-06       NA    1.12 0.96    NA     NA    NA     2.4
## 7 0.28 0.44 0.448  -6 0.0e+00       NA    0.95 0.97    NA     NA    NA     2.7
## 8 0.28 0.51 1.000  -8 1.3e-11       NA    0.92 0.97    NA     NA    NA     2.6
##    eChisq    SRMR eCRMS eBIC
## 1 1.6e+03 5.4e-02 0.063 1317
## 2 5.0e+02 3.1e-02 0.042  333
## 3 1.7e+02 1.8e-02 0.031   59
## 4 1.6e+01 5.5e-03 0.013  -38
## 5 9.9e+00 4.3e-03 0.026    1
## 6 1.3e-06 1.6e-06    NA   NA
## 7 7.7e-12 3.8e-09    NA   NA
## 8 6.9e-12 3.6e-09    NA   NA
# parallel analysis
fa.p <- psych::fa.parallel(abuse_df,
                   fm='ml',
                   fa='fa')

## Parallel analysis suggests that the number of factors =  4  and the number of components =  NA
# sum eigenvalues > 1
sum(fa.p$fa.values>1)
## [1] 1
# sum eigenvalues > .7
sum(fa.p$fa.values>.7)
## [1] 1

2.2 OPTIMAL solution: 2-factor

Based on the results above, analyses will proceed with a two-factor solution. Report maximum likelihood values.

Factors do appear to be quite highly correlated (r=.79). Might suggest not exteme differences between factors. An exploratory test of the three factor solution correlated in the range (r=.67-.74), so there is not much advantage in adding an additional factor. Additionally, one factor had 2 items.

All items for the two factor solution have appropriate factor loadings.

efa.fit <- psych::fa(abuse_df,
  nfactors = 2,
  rotate = 'oblimin',
  fm = 'ml'
)

print(efa.fit)
## Factor Analysis using method =  ml
## Call: psych::fa(r = abuse_df, nfactors = 2, rotate = "oblimin", fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##           ML1   ML2   h2   u2 com
## abuse_a  0.06  0.50 0.30 0.70 1.0
## abuse_b  0.19  0.44 0.37 0.63 1.4
## abuse_c -0.08  0.85 0.61 0.39 1.0
## abuse_d  0.03  0.79 0.66 0.34 1.0
## abuse_e  0.23  0.56 0.58 0.42 1.3
## abuse_f  0.80 -0.01 0.62 0.38 1.0
## abuse_g  0.86 -0.03 0.70 0.30 1.0
## abuse_h  0.67  0.08 0.55 0.45 1.0
## abuse_i  0.74  0.04 0.60 0.40 1.0
## 
##                        ML1  ML2
## SS loadings           2.67 2.31
## Proportion Var        0.30 0.26
## Cumulative Var        0.30 0.55
## Proportion Explained  0.54 0.46
## Cumulative Proportion 0.54 1.00
## 
##  With factor correlations of 
##     ML1 ML2
## ML1 1.0 0.8
## ML2 0.8 1.0
## 
## Mean item complexity =  1.1
## Test of the hypothesis that 2 factors are sufficient.
## 
## df null model =  36  with the objective function =  4.45 with Chi Square =  32692.35
## df of  the model are 19  and the objective function was  0.13 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.04 
## 
## The harmonic n.obs is  7359 with the empirical chi square  501.99  with prob <  2.1e-94 
## The total n.obs was  7359  with Likelihood Chi Square =  975.72  with prob <  8.1e-195 
## 
## Tucker Lewis Index of factoring reliability =  0.944
## RMSEA index =  0.083  and the 90 % confidence intervals are  0.078 0.087
## BIC =  806.55
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML1  ML2
## Correlation of (regression) scores with factors   0.94 0.93
## Multiple R square of scores with factors          0.89 0.87
## Minimum correlation of possible factor scores     0.78 0.73
summary(efa.fit)
## 
## Factor analysis with Call: psych::fa(r = abuse_df, nfactors = 2, rotate = "oblimin", fm = "ml")
## 
## Test of the hypothesis that 2 factors are sufficient.
## The degrees of freedom for the model is 19  and the objective function was  0.13 
## The number of observations was  7359  with Chi Square =  975.72  with prob <  8.1e-195 
## 
## The root mean square of the residuals (RMSA) is  0.03 
## The df corrected root mean square of the residuals is  0.04 
## 
## Tucker Lewis Index of factoring reliability =  0.944
## RMSEA index =  0.083  and the 10 % confidence intervals are  0.078 0.087
## BIC =  806.55
##  With factor correlations of 
##     ML1 ML2
## ML1 1.0 0.8
## ML2 0.8 1.0
# print loadings
efa.fit$loadings %>% 
  print(sort = T, cutoff = .3)
## 
## Loadings:
##         ML1    ML2   
## abuse_f  0.795       
## abuse_g  0.859       
## abuse_h  0.669       
## abuse_i  0.743       
## abuse_a         0.504
## abuse_c         0.845
## abuse_d         0.789
## abuse_e         0.561
## abuse_b         0.445
## 
##                  ML1   ML2
## SS loadings    2.473 2.114
## Proportion Var 0.275 0.235
## Cumulative Var 0.275 0.510
# plots
fa.plot(efa.fit) # fit plot (observe factor clusters)

fa.diagram(efa.fit)

2.3 (Sub-optimal) solution: 3-factor

Due to theoretical interest, a three-factor solution will also be reported. Note that given there is one factor with only two items, we will not be able to proceed with this solution. This is because psychometric literature suggests that a minimum of three items per factor are required produce stable scales.

Regardless, this will still be computed and discussed in the report.

efa.fit3 <- psych::fa(abuse_df,
  nfactors = 3,
  rotate = 'oblimin',
  fm = 'ml'
)

print(efa.fit3)
## Factor Analysis using method =  ml
## Call: psych::fa(r = abuse_df, nfactors = 3, rotate = "oblimin", fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##           ML1   ML2   ML3   h2   u2 com
## abuse_a -0.03  0.13  0.55 0.39 0.61 1.1
## abuse_b  0.04 -0.03  0.75 0.57 0.43 1.0
## abuse_c -0.02  0.69  0.11 0.58 0.42 1.1
## abuse_d  0.01  0.88 -0.05 0.74 0.26 1.0
## abuse_e  0.24  0.41  0.17 0.57 0.43 2.0
## abuse_f  0.80 -0.01 -0.01 0.63 0.37 1.0
## abuse_g  0.86 -0.01 -0.02 0.70 0.30 1.0
## abuse_h  0.67  0.02  0.07 0.55 0.45 1.0
## abuse_i  0.75  0.02  0.00 0.60 0.40 1.0
## 
##                        ML1  ML2  ML3
## SS loadings           2.60 1.65 1.08
## Proportion Var        0.29 0.18 0.12
## Cumulative Var        0.29 0.47 0.59
## Proportion Explained  0.49 0.31 0.20
## Cumulative Proportion 0.49 0.80 1.00
## 
##  With factor correlations of 
##      ML1  ML2  ML3
## ML1 1.00 0.77 0.70
## ML2 0.77 1.00 0.71
## ML3 0.70 0.71 1.00
## 
## Mean item complexity =  1.1
## Test of the hypothesis that 3 factors are sufficient.
## 
## df null model =  36  with the objective function =  4.45 with Chi Square =  32692.35
## df of  the model are 12  and the objective function was  0.07 
## 
## The root mean square of the residuals (RMSR) is  0.02 
## The df corrected root mean square of the residuals is  0.03 
## 
## The harmonic n.obs is  7359 with the empirical chi square  165.85  with prob <  3.4e-29 
## The total n.obs was  7359  with Likelihood Chi Square =  519.11  with prob <  1.9e-103 
## 
## Tucker Lewis Index of factoring reliability =  0.953
## RMSEA index =  0.076  and the 90 % confidence intervals are  0.07 0.081
## BIC =  412.27
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML1  ML2  ML3
## Correlation of (regression) scores with factors   0.94 0.93 0.87
## Multiple R square of scores with factors          0.89 0.86 0.76
## Minimum correlation of possible factor scores     0.78 0.72 0.51
summary(efa.fit3)
## 
## Factor analysis with Call: psych::fa(r = abuse_df, nfactors = 3, rotate = "oblimin", fm = "ml")
## 
## Test of the hypothesis that 3 factors are sufficient.
## The degrees of freedom for the model is 12  and the objective function was  0.07 
## The number of observations was  7359  with Chi Square =  519.11  with prob <  1.9e-103 
## 
## The root mean square of the residuals (RMSA) is  0.02 
## The df corrected root mean square of the residuals is  0.03 
## 
## Tucker Lewis Index of factoring reliability =  0.953
## RMSEA index =  0.076  and the 10 % confidence intervals are  0.07 0.081
## BIC =  412.27
##  With factor correlations of 
##      ML1  ML2  ML3
## ML1 1.00 0.77 0.70
## ML2 0.77 1.00 0.71
## ML3 0.70 0.71 1.00
# print loadings
efa.fit3$loadings %>% 
  print(sort = T, cutoff = .3)
## 
## Loadings:
##         ML1    ML2    ML3   
## abuse_f  0.804              
## abuse_g  0.857              
## abuse_h  0.667              
## abuse_i  0.755              
## abuse_c         0.689       
## abuse_d         0.883       
## abuse_a                0.548
## abuse_b                0.753
## abuse_e         0.413       
## 
##                  ML1   ML2   ML3
## SS loadings    2.459 1.446 0.916
## Proportion Var 0.273 0.161 0.102
## Cumulative Var 0.273 0.434 0.536
# plots
fa.plot(efa.fit3) # fit plot (observe factor clusters)

fa.diagram(efa.fit3)

load_mx <- as.data.frame(efa.fit3$loadings[])
load_mx <- round(load_mx,3)
load_mx <- load_mx[,order(colnames(load_mx))] # order columns

# replace values with NA if les than mod lowest factor of interest value
load_mx[abs(load_mx) < threshold] <- '-' 
load_mx <- load_mx %>%
  dplyr::mutate(study_qid=row.names(.))

abuse_load_ref <- abuse_item_ref |>
  dplyr::left_join(load_mx)

abuse_load_ref |>
  kbl() |>
  kable_classic(full_width = F, html_font = "Times")

2.4 Compute fit indices

Fit indices will be produced with the following heuristics:

  • CFI (Comparative Fit Index): Values close to or above 0.95 indicate a good fit.
  • TLI (Tucker-Lewis Index): Values close to or above 0.95 indicate a good fit.
  • RMSEA (Root Mean Square Error of Approximation): Values below 0.08 indicate a good fit.
print(paste("RMS:", efa.fit$rms))
## [1] "RMS: 0.0307802696093083"
print(efa.fit$RMSEA)
##      RMSEA      lower      upper confidence 
## 0.08271892 0.07834717 0.08718683 0.90000000
print(paste("TLI:", efa.fit$TLI))
## [1] "TLI: 0.944480777902642"
# comparative fit index (will be just slightly higher than TLI
print(paste("CFI:", 1-((efa.fit$dof) / (efa.fit$null.chisq - efa.fit$null.dof))))
## [1] "CFI: 0.999418183537539"
# efa.fit

2.5 Reliability

Compute reliability within each factor. Reliability for F1=.86, and F2=.81.

Reliability indices are in the acceptable range for reliability.

f1_items <- paste0('abuse_',c("f", "g", "h", "i"))
f2_items <- paste0('abuse_',c("a", "b", "c", "d", "e"))

abuse_df <- as.data.frame(abuse_df)

# compute reliability
f1_alpha <- psych::alpha(abuse_df[,f1_items], check.keys = T)
f1_omega <- psych::omega(abuse_df[,f1_items])

f1_alpha$total |>
  dplyr::mutate(omega=f1_omega$omega_h) |>
  kbl() |>
  kable_styling(full_width = F) |>
  footnote(general = "Reliability statistics computed over factor 1")
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r omega
0.8642657 0.8644664 0.8344829 0.6145783 6.378242 0.0025918 1.091113 0.3738094 0.6041995 0.8245792
Note:
Reliability statistics computed over factor 1
f2_alpha <- psych::alpha(abuse_df[,f2_items], check.keys = T)
f2_omega <- psych::omega(abuse_df[,f2_items])

f2_alpha$total |>
  dplyr::mutate(omega=f2_omega$omega_h) |>
  kbl() |>
  kable_styling(full_width = F) |>
  footnote(general = "Reliability statistics computed over factor 2")
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r omega
0.8155218 0.8281917 0.8056802 0.4908579 4.82044 0.0034002 1.186112 0.4473687 0.4577664 0.7077855
Note:
Reliability statistics computed over factor 2

2.6 Display items in each category

abuse_item_ref |>
  dplyr::mutate(factor=case_when(
    study_qid %in% f1_items ~ 1,
    study_qid %in% f2_items ~ 2
  )) |>
  dplyr::select(-unicef_qid) |>
  kbl() |>
  kable_classic(full_width = F, html_font = "Times")
study_qid item factor
abuse_a Someone made sexual comments about me 2
abuse_b Someone sent me sexual images I did not want 2
abuse_c I have been asked to talk about sex or sexual acts with someone when I did not want to 2
abuse_d I have been asked by someone to do something sexual when I did not want to 2
abuse_e I have been asked for a photo or video showing my private parts [translate as appropriate] when I did not want to 2
abuse_f Someone offered me money or gifts in return for sexual images or videos 1
abuse_g Someone offered me money or gifts to meet them in person to do something sexual 1
abuse_h Someone shared sexual images of me without my consent 1
abuse_i Someone threatened or blackmailed me to engage in sexual activities 1
threshold <- 0.3

# loadings matrix
load_mx <- as.data.frame(efa.fit$loadings[])
load_mx <- round(load_mx,3)
load_mx <- load_mx[,order(colnames(load_mx))] # order columns

# replace values with NA if les than mod lowest factor of interest value
load_mx[abs(load_mx) < threshold] <- '-' 
load_mx <- load_mx %>%
  dplyr::mutate(study_qid=row.names(.))

abuse_load_ref <- abuse_item_ref |>
  dplyr::left_join(load_mx)

abuse_load_ref |>
  kbl() |>
  kable_classic(full_width = F, html_font = "Times")
unicef_qid study_qid item ML1 ML2
G1a abuse_a Someone made sexual comments about me
0.504
G1c abuse_b Someone sent me sexual images I did not want
0.445
G3b abuse_c I have been asked to talk about sex or sexual acts with someone when I did not want to
0.845
G3c abuse_d I have been asked by someone to do something sexual when I did not want to
0.789
G3d abuse_e I have been asked for a photo or video showing my private parts [translate as appropriate] when I did not want to
0.561
G4a abuse_f Someone offered me money or gifts in return for sexual images or videos 0.795
G4b abuse_g Someone offered me money or gifts to meet them in person to do something sexual 0.859
G4c abuse_h Someone shared sexual images of me without my consent 0.669
G4d abuse_i Someone threatened or blackmailed me to engage in sexual activities 0.743

3 Confirmatory Factor Analysis (CFA)

abuse_df <- cfa_data |>
  dplyr::select(contains("abuse_"))

abuse.model <- "A =~ abuse_f + abuse_g + abuse_h + abuse_i
                A ~~ A
                B =~ abuse_a + abuse_b + abuse_c + abuse_d + abuse_e
                B ~~ B"

cfa.abuse <- lavaan::cfa(model=abuse.model, data=abuse_df, 
                         missing = "pairwise", effect.coding = T, 
                         se = "robust", test = "Yuan.Bentler")

summary(cfa.abuse)
## lavaan 0.6.17 ended normally after 44 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        21
##   Number of equality constraints                     2
## 
##   Number of observations                          3137
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                               619.445     102.684
##   Degrees of freedom                                26          26
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  6.033
##     Yuan-Bentler correction                                       
## 
## Parameter Estimates:
## 
##   Standard errors                           Robust.sem
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   A =~                                                
##     abuse_f           1.094    0.046   23.569    0.000
##     abuse_g           1.107    0.043   25.769    0.000
##     abuse_h           0.960    0.061   15.778    0.000
##     abuse_i           0.840    0.051   16.485    0.000
##   B =~                                                
##     abuse_a           0.965    0.041   23.335    0.000
##     abuse_b           1.077    0.040   26.868    0.000
##     abuse_c           1.050    0.036   29.331    0.000
##     abuse_d           1.009    0.036   28.136    0.000
##     abuse_e           0.899    0.041   22.172    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   A ~~                                                
##     B                 0.101    0.013    7.772    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     A                 0.104    0.014    7.465    0.000
##     B                 0.153    0.014   10.712    0.000
##    .abuse_f           0.052    0.008    6.887    0.000
##    .abuse_g           0.044    0.007    6.621    0.000
##    .abuse_h           0.084    0.011    7.668    0.000
##    .abuse_i           0.080    0.011    7.015    0.000
##    .abuse_a           0.279    0.018   15.924    0.000
##    .abuse_b           0.287    0.019   15.222    0.000
##    .abuse_c           0.090    0.008   10.910    0.000
##    .abuse_d           0.079    0.008   10.584    0.000
##    .abuse_e           0.093    0.010    9.113    0.000
lavaan::fitmeasures(cfa.abuse)
##                          npar                          fmin 
##                        19.000                         0.099 
##                         chisq                            df 
##                       619.445                        26.000 
##                        pvalue                  chisq.scaled 
##                         0.000                       102.684 
##                     df.scaled                 pvalue.scaled 
##                        26.000                         0.000 
##          chisq.scaling.factor                baseline.chisq 
##                         6.033                     14515.043 
##                   baseline.df               baseline.pvalue 
##                        36.000                         0.000 
##         baseline.chisq.scaled            baseline.df.scaled 
##                       964.742                        36.000 
##        baseline.pvalue.scaled baseline.chisq.scaling.factor 
##                         0.000                        15.046 
##                           cfi                           tli 
##                         0.959                         0.943 
##                    cfi.scaled                    tli.scaled 
##                         0.917                         0.886 
##                    cfi.robust                    tli.robust 
##                         0.967                         0.954 
##                          nnfi                           rfi 
##                         0.943                         0.941 
##                           nfi                          pnfi 
##                         0.957                         0.691 
##                           ifi                           rni 
##                         0.959                         0.959 
##                   nnfi.scaled                    rfi.scaled 
##                         0.886                         0.853 
##                    nfi.scaled                   pnfi.scaled 
##                         0.894                         0.645 
##                    ifi.scaled                    rni.scaled 
##                         0.918                         0.917 
##                   nnfi.robust                    rni.robust 
##                         0.954                         0.967 
##                          logl             unrestricted.logl 
##                    -12649.822                    -12340.100 
##                           aic                           bic 
##                     25337.645                     25452.614 
##                        ntotal                          bic2 
##                      3137.000                     25392.243 
##             scaling.factor.h1             scaling.factor.h0 
##                         8.412                        10.813 
##                         rmsea                rmsea.ci.lower 
##                         0.085                         0.080 
##                rmsea.ci.upper                rmsea.ci.level 
##                         0.091                         0.900 
##                  rmsea.pvalue                rmsea.close.h0 
##                         0.000                         0.050 
##         rmsea.notclose.pvalue             rmsea.notclose.h0 
##                         0.935                         0.080 
##                  rmsea.scaled         rmsea.ci.lower.scaled 
##                         0.031                         0.028 
##         rmsea.ci.upper.scaled           rmsea.pvalue.scaled 
##                         0.033                         1.000 
##  rmsea.notclose.pvalue.scaled                  rmsea.robust 
##                         0.000                         0.075 
##         rmsea.ci.lower.robust         rmsea.ci.upper.robust 
##                         0.060                         0.091 
##           rmsea.pvalue.robust  rmsea.notclose.pvalue.robust 
##                         0.003                         0.324 
##                           rmr                    rmr_nomean 
##                         0.012                         0.012 
##                          srmr                  srmr_bentler 
##                         0.040                         0.040 
##           srmr_bentler_nomean                          crmr 
##                         0.040                         0.044 
##                   crmr_nomean                    srmr_mplus 
##                         0.044                         0.040 
##             srmr_mplus_nomean                         cn_05 
##                         0.040                       197.923 
##                         cn_01                           gfi 
##                       232.139                         0.957 
##                          agfi                          pgfi 
##                         0.925                         0.553 
##                           mfi                          ecvi 
##                         0.910                         0.210
semTools::reliability(cfa.abuse)
##                A         B
## alpha  0.8608803 0.8270482
## omega  0.8655533 0.8215162
## omega2 0.8655533 0.8215162
## omega3 0.8678302 0.8093637
## avevar 0.6195729 0.4803087

4 Measurement invariance

Subset data to include only participants used in the FA.

# obtain data including only relevant rows
inv_df <- data_subset |>
  dplyr::filter(id_rnum %in% id_included)

inv_df$age <- as.numeric(inv_df$age)

# create labels
label(inv_df$sex) <- "Sex"
label(inv_df$age) <-"Age (in years)"
label(inv_df$urbanrural) <-"Urbanicity"

# render table
table1::table1( ~ sex + age + urbanrural | COUNTRY,
  data = inv_df,
  droplevels = F
)
Cambodia
(N=777)
Ethiopia
(N=766)
Indonesia
(N=914)
Kenya
(N=910)
Malaysia
(N=924)
Mozambique
(N=886)
Namibia
(N=927)
Philippines
(N=787)
Tanzania
(N=900)
Thailand
(N=892)
Uganda
(N=876)
Vietnam
(N=937)
Overall
(N=10496)
Sex
Boy 389 (50.1%) 511 (66.7%) 403 (44.1%) 429 (47.1%) 481 (52.1%) 488 (55.1%) 457 (49.3%) 333 (42.3%) 527 (58.6%) 384 (43.0%) 513 (58.6%) 451 (48.1%) 5366 (51.1%)
Girl 388 (49.9%) 255 (33.3%) 511 (55.9%) 481 (52.9%) 443 (47.9%) 398 (44.9%) 470 (50.7%) 454 (57.7%) 373 (41.4%) 508 (57.0%) 363 (41.4%) 486 (51.9%) 5130 (48.9%)
Age (in years)
Mean (SD) 14.4 (1.71) 15.9 (1.43) 14.5 (1.71) 14.9 (1.73) 14.3 (1.69) 15.4 (1.55) 15.2 (1.67) 14.5 (1.69) 15.3 (1.60) 14.4 (1.60) 15.8 (1.47) 14.2 (1.64) 14.9 (1.72)
Median [Min, Max] 14.0 [12.0, 17.0] 16.0 [12.0, 17.0] 15.0 [12.0, 17.0] 15.0 [12.0, 17.0] 14.0 [12.0, 17.0] 16.0 [12.0, 17.0] 15.0 [12.0, 17.0] 14.0 [12.0, 17.0] 16.0 [12.0, 17.0] 14.0 [12.0, 17.0] 16.0 [12.0, 17.0] 14.0 [12.0, 17.0] 15.0 [12.0, 17.0]
Urbanicity
Peri urban 118 (15.2%) 141 (18.4%) 165 (18.1%) 26 (2.9%) 157 (17.0%) 52 (5.9%) 58 (6.3%) 65 (8.3%) 37 (4.1%) 76 (8.5%) 45 (5.1%) 86 (9.2%) 1026 (9.8%)
Rural 531 (68.3%) 442 (57.7%) 433 (47.4%) 585 (64.3%) 331 (35.8%) 545 (61.5%) 434 (46.8%) 375 (47.6%) 549 (61.0%) 468 (52.5%) 624 (71.2%) 603 (64.4%) 5920 (56.4%)
Urban 128 (16.5%) 183 (23.9%) 316 (34.6%) 299 (32.9%) 436 (47.2%) 289 (32.6%) 435 (46.9%) 347 (44.1%) 314 (34.9%) 348 (39.0%) 207 (23.6%) 248 (26.5%) 3550 (33.8%)

4.1 MI: Gender

# i) Configural model
abuse.model %>% lavaan::cfa(
  # data_cfa %>% remove_labels(),
          labelled::remove_labels(inv_df),
          effect.coding = T,
          meanstructure = T,
          missing = "pairwise",
          group = "sex",
          estimator = "MLR") -> fit.tp_sex_configural.cfa

# ii) Metric model
abuse.model %>% lavaan::cfa(
  # data_cfa %>% remove_labels(),
          labelled::remove_labels(inv_df),
          effect.coding = T,
          meanstructure = T,
          missing = "pairwise",
          group = "sex",
          estimator = "MLR",
          group.equal = "loadings") -> fit.tp_sex_weak.cfa

## iii) Scalar model
abuse.model %>% lavaan::cfa(
  # data_cfa %>% remove_labels(),
          labelled::remove_labels(inv_df), 
          effect.coding = T,
          meanstructure = T,
          missing = "pairwise",
          group = "sex",
          estimator = "MLR",
          group.equal = c("loadings", "intercepts")) -> fit.tp_sex_strong.cfa

# iv) Residual model
abuse.model %>% lavaan::cfa(
  # data_cfa %>% remove_labels(),
          labelled::remove_labels(inv_df),
          effect.coding = T,
          missing = "pairwise",
          meanstructure = T,
          group = "sex",
          estimator = "MLR",
          group.equal = c("loadings", "intercepts", "residuals")) -> fit.tp_sex_strict.cfa

4.1.1 Results

Intepretation of results:

  1. Configural vs metric (weak) model: If the chi-square difference test between the configural and metric model is non-significant (p > 0.05), it suggests that constraining the loadings to be equal across groups does not significantly worsen the model fit. This supports metric invariance.

  2. Metric (Weak) vs. Scalar (Strong) model: If the chi-square difference test between the metric and scalar model is non-significant (p > 0.05), it indicates that constraining both the loadings and intercepts to be equal across groups does not significantly worsen the model fit. This supports scalar invariance.

  3. Scalar (Strong) vs. Residual (Strict) Model: If the chi-square difference test between the scalar and residual model is non-significant (p > 0.05), it suggests that constraining the residual variances to be equal across groups does not significantly worsen the model fit

# Pairwise comparisons using anova function
anova_configural_metric <- anova(fit.tp_sex_configural.cfa, fit.tp_sex_weak.cfa)
anova_metric_scalar <- anova(fit.tp_sex_weak.cfa, fit.tp_sex_strong.cfa)
anova_scalar_residual <- anova(fit.tp_sex_strong.cfa, fit.tp_sex_strict.cfa)

# Print pairwise comparison results
print(anova_configural_metric)
## 
## Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
## 
## lavaan NOTE:
##     The "Chisq" column contains standard test statistics, not the
##     robust test that should be reported per model. A robust difference
##     test is a function of two standard (not robust) statistics.
##  
##                           Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## fit.tp_sex_configural.cfa 52 93415 93821 1864.5                              
## fit.tp_sex_weak.cfa       59 93431 93787 1894.8     3.5536       7     0.8295
print(anova_metric_scalar)
## 
## Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
## 
## lavaan NOTE:
##     The "Chisq" column contains standard test statistics, not the
##     robust test that should be reported per model. A robust difference
##     test is a function of two standard (not robust) statistics.
##  
##                       Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)    
## fit.tp_sex_weak.cfa   59 93431 93787 1894.8                                  
## fit.tp_sex_strong.cfa 66 93442 93747 1919.6     24.937       7  0.0007785 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(anova_scalar_residual)
## 
## Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
## 
## lavaan NOTE:
##     The "Chisq" column contains standard test statistics, not the
##     robust test that should be reported per model. A robust difference
##     test is a function of two standard (not robust) statistics.
##  
##                       Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)   
## fit.tp_sex_strong.cfa 66 93442 93747 1919.6                                 
## fit.tp_sex_strict.cfa 75 93786 94025 2281.4      26.68       9   0.001579 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Extract fit indices for each model
fit_indices_configural <- fitMeasures(fit.tp_sex_configural.cfa, c("cfi", "tli", "rmsea", "srmr"))
fit_indices_metric <- fitMeasures(fit.tp_sex_weak.cfa, c("cfi", "tli", "rmsea", "srmr"))
fit_indices_scalar <- fitMeasures(fit.tp_sex_strong.cfa, c("cfi", "tli", "rmsea", "srmr"))
fit_indices_residual <- fitMeasures(fit.tp_sex_strict.cfa, c("cfi", "tli", "rmsea", "srmr"))

# Calculate fit indices differences
cfi_diff_metric_scalar <- fitmeasures(fit.tp_sex_weak.cfa)["cfi"] - fitmeasures(fit.tp_sex_strong.cfa)["cfi"]
tli_diff_metric_scalar <- fitmeasures(fit.tp_sex_weak.cfa)["tli"] - fitmeasures(fit.tp_sex_strong.cfa)["tli"]
rmsea_diff_metric_scalar <- fitmeasures(fit.tp_sex_weak.cfa)["rmsea"] - fitmeasures(fit.tp_sex_strong.cfa)["rmsea"]
srmr_diff_metric_scalar <- fitmeasures(fit.tp_sex_weak.cfa)["srmr"] - fitmeasures(fit.tp_sex_strong.cfa)["srmr"]

cfi_diff_scalar_residual <- fitmeasures(fit.tp_sex_strong.cfa)["cfi"] - fitmeasures(fit.tp_sex_strict.cfa)["cfi"]
tli_diff_scalar_residual <- fitmeasures(fit.tp_sex_strong.cfa)["tli"] - fitmeasures(fit.tp_sex_strict.cfa)["tli"]
rmsea_diff_scalar_residual <- fitmeasures(fit.tp_sex_strong.cfa)["rmsea"] - fitmeasures(fit.tp_sex_strict.cfa)["rmsea"]
srmr_diff_scalar_residual <- fitmeasures(fit.tp_sex_strong.cfa)["srmr"] - fitmeasures(fit.tp_sex_strict.cfa)["srmr"]

# Create a data frame with the results
results_sex <- data.frame(
  Comparison = c("Configural vs Metric", "Metric vs Scalar", "Scalar vs Residual"),
  Chi_Square_Diff = c(anova_configural_metric$Chisq[2], anova_metric_scalar$Chisq[2], anova_scalar_residual$Chisq[2]),
  DF_Diff = c(anova_configural_metric$Df[2], anova_metric_scalar$Df[2], anova_scalar_residual$Df[2]),
  p_value = c(anova_configural_metric$`Pr(>Chisq)`[2], anova_metric_scalar$`Pr(>Chisq)`[2], anova_scalar_residual$`Pr(>Chisq)`[2]),
  CFI = c(fit_indices_configural["cfi"], cfi_diff_metric_scalar, cfi_diff_scalar_residual),
  TLI = c(fit_indices_configural["tli"], tli_diff_metric_scalar, tli_diff_scalar_residual),
  RMSEA = c(fit_indices_configural["rmsea"], rmsea_diff_metric_scalar, rmsea_diff_scalar_residual),
  SRMR = c(fit_indices_configural["srmr"], srmr_diff_metric_scalar, srmr_diff_scalar_residual)
)

# output model comparison indices
results_sex |>
  kbl() |>
  kable_styling(full_width = F)
Comparison Chi_Square_Diff DF_Diff p_value CFI TLI RMSEA SRMR
Configural vs Metric 1894.784 59 0.8295162 0.9615500 0.9467615 0.0814967 0.0300867
Metric vs Scalar 1919.565 66 0.0007785 0.0003772 -0.0046290 0.0038460 -0.0003722
Scalar vs Residual 2281.380 75 0.0015795 0.0074846 0.0020377 -0.0017173 -0.0048051
# Compare the models
anova_results_sex <- anova(fit.tp_sex_configural.cfa, fit.tp_sex_weak.cfa, fit.tp_sex_strong.cfa, fit.tp_sex_strict.cfa)

# Print the comparison results
print(anova_results_sex)
## 
## Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
## 
## lavaan NOTE:
##     The "Chisq" column contains standard test statistics, not the
##     robust test that should be reported per model. A robust difference
##     test is a function of two standard (not robust) statistics.
##  
##                           Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## fit.tp_sex_configural.cfa 52 93415 93821 1864.5                              
## fit.tp_sex_weak.cfa       59 93431 93787 1894.8     3.5536       7  0.8295162
## fit.tp_sex_strong.cfa     66 93442 93747 1919.6    24.9373       7  0.0007785
## fit.tp_sex_strict.cfa     75 93786 94025 2281.4    26.6804       9  0.0015795
##                              
## fit.tp_sex_configural.cfa    
## fit.tp_sex_weak.cfa          
## fit.tp_sex_strong.cfa     ***
## fit.tp_sex_strict.cfa     ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova_results_sex |>
  kbl() |>
  kable_styling(full_width = F)
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.tp_sex_configural.cfa 52 93414.81 93821.30 1864.496 NA NA NA
fit.tp_sex_weak.cfa 59 93431.10 93786.77 1894.784 3.553598 7 0.8295162
fit.tp_sex_strong.cfa 66 93441.88 93746.74 1919.565 24.937263 7 0.0007785
fit.tp_sex_strict.cfa 75 93785.69 94025.23 2281.380 26.680420 9 0.0015795
# tabulate additional fit indices
fit_indices_sex <- data.frame(
  Model = c("Configural", "Metric", "Scalar", "Residual"),
  CFI = c(
    fitMeasures(fit.tp_sex_configural.cfa, "cfi"),
    fitMeasures(fit.tp_sex_weak.cfa, "cfi"),
    fitMeasures(fit.tp_sex_strong.cfa, "cfi"),
    fitMeasures(fit.tp_sex_strict.cfa, "cfi")
  ),
  TLI = c(
    fitMeasures(fit.tp_sex_configural.cfa, "tli"),
    fitMeasures(fit.tp_sex_weak.cfa, "tli"),
    fitMeasures(fit.tp_sex_strong.cfa, "tli"),
    fitMeasures(fit.tp_sex_strict.cfa, "tli")
  ),
  RMSEA = c(
    fitMeasures(fit.tp_sex_configural.cfa, "rmsea"),
    fitMeasures(fit.tp_sex_weak.cfa, "rmsea"),
    fitMeasures(fit.tp_sex_strong.cfa, "rmsea"),
    fitMeasures(fit.tp_sex_strict.cfa, "rmsea")
  )
)

# output fit indices
fit_indices_sex |>
  kbl() |>
  kable_styling(full_width = F)
Model CFI TLI RMSEA
Configural 0.9615500 0.9467615 0.0814967
Metric 0.9610559 0.9524751 0.0769995
Scalar 0.9606787 0.9571041 0.0731535
Residual 0.9531942 0.9550664 0.0748708

4.1.2 Latent means plot

lavaan::parameterestimates(fit.tp_sex_strict.cfa) %>% 
  filter(grepl("abuse_.$", lhs) & op == "~1") %>% 
  ggplot(aes(x = lhs, 
             y = est, 
             group = as.factor(group), 
             color = as.factor(group), 
             ymin = ci.lower, 
             ymax = ci.upper)) +
  geom_point(size = 3, alpha = 0.7) +
  geom_errorbar(width = 0.3, position = position_dodge(width = 0.5), alpha = 0.7) +
  geom_line(size = 1, position = position_dodge(width = 0.5), alpha = 0.7) +
  # scale_color_brewer(palette = "Set1", labels = c("Female", "Male")) +
  scale_color_lancet(labels=c("Female", "Male")) +
  labs(x = "Subscale", y = "Latent mean", color = "Sex", 
       title = "Latent Means by Sex") +
  # scale_x_discrete(labels = c("Authoritarianism", 
  #                             "Mediation", 
  #                             "Monitoring by Proxy",
  #                             "Boundary Negotiation",
  #                             "Technological Monitoring",
  #                             "Co-Use")) +
  theme_minimal() +
  theme(
    legend.position = "top",
    legend.title = element_text(size = 12, face = "bold"),
    legend.text = element_text(size = 10),
    axis.text.x = element_text(angle = 45, hjust = 1, size = 10, color = "black"),
    axis.text.y = element_text(size = 10, color = "black"),
    axis.title = element_text(size = 12, color = "black"),
    plot.title = element_text(size = 14, hjust = 0.5, face = "bold"),
    plot.subtitle = element_text(size = 12, hjust = 0.5)
  )

4.2 MI: Age

4.2.1 Define age categories

Determine appropriate age categories across the cohort.

Define age categories into three equal groups (terciles). Defined age categories with sample sizes are as follows:

inv_df |>
  ggplot(aes(x=age)) +
  geom_histogram(binwidth = 1, fill = "skyblue", color = "black", alpha = 0.7) +
  labs(x = "Age", y = "Count", title = "Histogram of Age by Sample")

# define age cutoffs
age_cutoffs <- quantile(unique(inv_df$age), probs = c(1/3, 2/3))

# define
inv_df <- inv_df |>
  dplyr::arrange(age) |>
  dplyr::mutate(age_tercile = cut(age, breaks = c(-Inf, age_cutoffs, Inf), 
                                  labels = c(1, 2, 3)),
                .after=age)

inv_df |>
  dplyr::group_by(age_tercile) |>
  dplyr::summarise(
    count=n(),
    min_age=min(age,na.rm=T),
    max_age=max(age,na.rm=T)
  ) |>
  dplyr::ungroup() |>
  kbl() |>
  kable_classic(full_width = F, html_font = "Times")
age_tercile count min_age max_age
1 2706 12 13
2 3285 14 15
3 4505 16 17

Setup model:

# i) Configural model
abuse.model %>% lavaan::cfa(
  # data_cfa %>% remove_labels(),
          labelled::remove_labels(inv_df),
          effect.coding = T,
          meanstructure = T,
          missing = "pairwise",
          group = "age_tercile",
          estimator = "MLR") -> fit.tp_sex_configural.cfa

# ii) Metric model
abuse.model %>% lavaan::cfa(
  # data_cfa %>% remove_labels(),
          labelled::remove_labels(inv_df),
          effect.coding = T,
          meanstructure = T,
          missing = "pairwise",
          group = "age_tercile",
          estimator = "MLR",
          group.equal = "loadings") -> fit.tp_sex_weak.cfa

## iii) Scalar model
abuse.model %>% lavaan::cfa(
  # data_cfa %>% remove_labels(),
          labelled::remove_labels(inv_df), 
          effect.coding = T,
          meanstructure = T,
          missing = "pairwise",
          group = "age_tercile",
          estimator = "MLR",
          group.equal = c("loadings", "intercepts")) -> fit.tp_sex_strong.cfa

# iv) Residual model
abuse.model %>% lavaan::cfa(
  # data_cfa %>% remove_labels(),
          labelled::remove_labels(inv_df),
          effect.coding = T,
          missing = "pairwise",
          meanstructure = T,
          group = "age_tercile",
          estimator = "MLR",
          group.equal = c("loadings", "intercepts", "residuals")) -> fit.tp_sex_strict.cfa

4.2.2 Results

Intepretation of results:

  1. Configural vs metric (weak) model: If the chi-square difference test between the configural and metric model is non-significant (p > 0.05), it suggests that constraining the loadings to be equal across groups does not significantly worsen the model fit. This supports metric invariance.

  2. Metric (Weak) vs. Scalar (Strong) model: If the chi-square difference test between the metric and scalar model is non-significant (p > 0.05), it indicates that constraining both the loadings and intercepts to be equal across groups does not significantly worsen the model fit. This supports scalar invariance.

  3. Scalar (Strong) vs. Residual (Strict) Model: If the chi-square difference test between the scalar and residual model is non-significant (p > 0.05), it suggests that constraining the residual variances to be equal across groups does not significantly worsen the model fit

# Pairwise comparisons using anova function
anova_configural_metric <- anova(fit.tp_sex_configural.cfa, fit.tp_sex_weak.cfa)
anova_metric_scalar <- anova(fit.tp_sex_weak.cfa, fit.tp_sex_strong.cfa)
anova_scalar_residual <- anova(fit.tp_sex_strong.cfa, fit.tp_sex_strict.cfa)

# Print pairwise comparison results
print(anova_configural_metric)
## 
## Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
## 
## lavaan NOTE:
##     The "Chisq" column contains standard test statistics, not the
##     robust test that should be reported per model. A robust difference
##     test is a function of two standard (not robust) statistics.
##  
##                           Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## fit.tp_sex_configural.cfa 78 88446 89055 1942.5                              
## fit.tp_sex_weak.cfa       92 88855 89363 2379.7     47.327      14  1.698e-05
##                              
## fit.tp_sex_configural.cfa    
## fit.tp_sex_weak.cfa       ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(anova_metric_scalar)
## 
## Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
## 
## lavaan NOTE:
##     The "Chisq" column contains standard test statistics, not the
##     robust test that should be reported per model. A robust difference
##     test is a function of two standard (not robust) statistics.
##  
##                        Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)    
## fit.tp_sex_weak.cfa    92 88855 89363 2379.7                                  
## fit.tp_sex_strong.cfa 106 88980 89386 2532.8     142.96      14  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(anova_scalar_residual)
## 
## Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
## 
## lavaan NOTE:
##     The "Chisq" column contains standard test statistics, not the
##     robust test that should be reported per model. A robust difference
##     test is a function of two standard (not robust) statistics.
##  
##                        Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)    
## fit.tp_sex_strong.cfa 106 88980 89386 2532.8                                  
## fit.tp_sex_strict.cfa 124 93226 93502 6815.4      251.1      18  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Extract fit indices for each model
fit_indices_configural <- fitMeasures(fit.tp_sex_configural.cfa, c("cfi", "tli", "rmsea", "srmr"))
fit_indices_metric <- fitMeasures(fit.tp_sex_weak.cfa, c("cfi", "tli", "rmsea", "srmr"))
fit_indices_scalar <- fitMeasures(fit.tp_sex_strong.cfa, c("cfi", "tli", "rmsea", "srmr"))
fit_indices_residual <- fitMeasures(fit.tp_sex_strict.cfa, c("cfi", "tli", "rmsea", "srmr"))

# Calculate fit indices differences
cfi_diff_metric_scalar <- fitmeasures(fit.tp_sex_weak.cfa)["cfi"] - fitmeasures(fit.tp_sex_strong.cfa)["cfi"]
tli_diff_metric_scalar <- fitmeasures(fit.tp_sex_weak.cfa)["tli"] - fitmeasures(fit.tp_sex_strong.cfa)["tli"]
rmsea_diff_metric_scalar <- fitmeasures(fit.tp_sex_weak.cfa)["rmsea"] - fitmeasures(fit.tp_sex_strong.cfa)["rmsea"]
srmr_diff_metric_scalar <- fitmeasures(fit.tp_sex_weak.cfa)["srmr"] - fitmeasures(fit.tp_sex_strong.cfa)["srmr"]

cfi_diff_scalar_residual <- fitmeasures(fit.tp_sex_strong.cfa)["cfi"] - fitmeasures(fit.tp_sex_strict.cfa)["cfi"]
tli_diff_scalar_residual <- fitmeasures(fit.tp_sex_strong.cfa)["tli"] - fitmeasures(fit.tp_sex_strict.cfa)["tli"]
rmsea_diff_scalar_residual <- fitmeasures(fit.tp_sex_strong.cfa)["rmsea"] - fitmeasures(fit.tp_sex_strict.cfa)["rmsea"]
srmr_diff_scalar_residual <- fitmeasures(fit.tp_sex_strong.cfa)["srmr"] - fitmeasures(fit.tp_sex_strict.cfa)["srmr"]

# Create a data frame with the results
results_sex <- data.frame(
  Comparison = c("Configural vs Metric", "Metric vs Scalar", "Scalar vs Residual"),
  Chi_Square_Diff = c(anova_configural_metric$Chisq[2], anova_metric_scalar$Chisq[2], anova_scalar_residual$Chisq[2]),
  DF_Diff = c(anova_configural_metric$Df[2], anova_metric_scalar$Df[2], anova_scalar_residual$Df[2]),
  p_value = c(anova_configural_metric$`Pr(>Chisq)`[2], anova_metric_scalar$`Pr(>Chisq)`[2], anova_scalar_residual$`Pr(>Chisq)`[2]),
  CFI = c(fit_indices_configural["cfi"], cfi_diff_metric_scalar, cfi_diff_scalar_residual),
  TLI = c(fit_indices_configural["tli"], tli_diff_metric_scalar, tli_diff_scalar_residual),
  RMSEA = c(fit_indices_configural["rmsea"], rmsea_diff_metric_scalar, rmsea_diff_scalar_residual),
  SRMR = c(fit_indices_configural["srmr"], srmr_diff_metric_scalar, srmr_diff_scalar_residual)
)

# output model comparison indices
results_sex |>
  kbl() |>
  kable_styling(full_width = F)
Comparison Chi_Square_Diff DF_Diff p_value CFI TLI RMSEA SRMR
Configural vs Metric 2379.744 92 1.7e-05 0.9631258 0.9489434 0.0826569 0.0304052
Metric vs Scalar 2532.841 106 0.0e+00 0.0027510 -0.0042122 0.0034120 -0.0013302
Scalar vs Residual 6815.394 124 0.0e+00 0.0843415 0.0663600 -0.0432987 -0.0255152
# Compare the models
anova_results_sex <- anova(fit.tp_sex_configural.cfa, fit.tp_sex_weak.cfa, fit.tp_sex_strong.cfa, fit.tp_sex_strict.cfa)

# Print the comparison results
print(anova_results_sex)
## 
## Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
## 
## lavaan NOTE:
##     The "Chisq" column contains standard test statistics, not the
##     robust test that should be reported per model. A robust difference
##     test is a function of two standard (not robust) statistics.
##  
##                            Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## fit.tp_sex_configural.cfa  78 88446 89055 1942.5                              
## fit.tp_sex_weak.cfa        92 88855 89363 2379.7     47.327      14  1.698e-05
## fit.tp_sex_strong.cfa     106 88980 89386 2532.8    142.964      14  < 2.2e-16
## fit.tp_sex_strict.cfa     124 93226 93502 6815.4    251.096      18  < 2.2e-16
##                              
## fit.tp_sex_configural.cfa    
## fit.tp_sex_weak.cfa       ***
## fit.tp_sex_strong.cfa     ***
## fit.tp_sex_strict.cfa     ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova_results_sex |>
  kbl() |>
  kable_styling(full_width = F)
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.tp_sex_configural.cfa 78 88445.55 89055.29 1942.470 NA NA NA
fit.tp_sex_weak.cfa 92 88854.83 89362.94 2379.744 47.32741 14 1.7e-05
fit.tp_sex_strong.cfa 106 88979.92 89386.41 2532.841 142.96395 14 0.0e+00
fit.tp_sex_strict.cfa 124 93226.48 93502.31 6815.394 251.09631 18 0.0e+00
# tabulate additional fit indices
fit_indices_sex <- data.frame(
  Model = c("Configural", "Metric", "Scalar", "Residual"),
  CFI = c(
    fitMeasures(fit.tp_sex_configural.cfa, "cfi"),
    fitMeasures(fit.tp_sex_weak.cfa, "cfi"),
    fitMeasures(fit.tp_sex_strong.cfa, "cfi"),
    fitMeasures(fit.tp_sex_strict.cfa, "cfi")
  ),
  TLI = c(
    fitMeasures(fit.tp_sex_configural.cfa, "tli"),
    fitMeasures(fit.tp_sex_weak.cfa, "tli"),
    fitMeasures(fit.tp_sex_strong.cfa, "tli"),
    fitMeasures(fit.tp_sex_strict.cfa, "tli")
  ),
  RMSEA = c(
    fitMeasures(fit.tp_sex_configural.cfa, "rmsea"),
    fitMeasures(fit.tp_sex_weak.cfa, "rmsea"),
    fitMeasures(fit.tp_sex_strong.cfa, "rmsea"),
    fitMeasures(fit.tp_sex_strict.cfa, "rmsea")
  )
)

# output fit indices
fit_indices_sex |>
  kbl() |>
  kable_styling(full_width = F)
Model CFI TLI RMSEA
Configural 0.9631258 0.9489434 0.0826569
Metric 0.9547546 0.9468858 0.0843060
Scalar 0.9520036 0.9510980 0.0808940
Residual 0.8676621 0.8847380 0.1241927

4.2.3 Latent means plot

lavaan::parameterestimates(fit.tp_sex_strict.cfa) %>% 
  filter(grepl("abuse_.$", lhs) & op == "~1") %>% 
  ggplot(aes(x = lhs, 
             y = est, 
             group = as.factor(group), 
             color = as.factor(group), 
             ymin = ci.lower, 
             ymax = ci.upper)) +
  geom_point(size = 3, alpha = 0.7) +
  geom_errorbar(width = 0.3, position = position_dodge(width = 0.5), alpha = 0.7) +
  geom_line(size = 1, position = position_dodge(width = 0.5), alpha = 0.7) +
  scale_color_lancet(labels=c("12-13", "14-15", "16-17")) +
  labs(x = "Subscale", y = "Latent mean", color = "Age (in years)", 
       title = "Latent Means by Age") +
  theme_minimal() +
  theme(
    legend.position = "top",
    legend.title = element_text(size = 12, face = "bold"),
    legend.text = element_text(size = 10),
    axis.text.x = element_text(angle = 45, hjust = 1, size = 10, color = "black"),
    axis.text.y = element_text(size = 10, color = "black"),
    axis.title = element_text(size = 12, color = "black"),
    plot.title = element_text(size = 14, hjust = 0.5, face = "bold"),
    plot.subtitle = element_text(size = 12, hjust = 0.5)
  )

4.3 MI: Country

MI by country was having convergence issues. Attempt to run for each country separately.

Vietnam appeared to be problematic and could not produce a solution.

unique_countries <- unique(inv_df$COUNTRY)
for (country in unique_countries) {
  group_data <- inv_df[inv_df$COUNTRY == country, ]
  cat("Running CFA for country:", country, "\n")
  tryCatch({
    fit <- lavaan::cfa(
      model = abuse.model,
      data = group_data,
      meanstructure = TRUE,
      missing = "pairwise",
      estimator = "MLR"
    )
    # print(summary(fit))
  }, error = function(e) {
    cat("Error for country:", country, "\n", e$message, "\n")
  })
}
## Running CFA for country: Ethiopia 
## Running CFA for country: Kenya 
## Running CFA for country: Mozambique 
## Running CFA for country: Namibia 
## Running CFA for country: Tanzania 
## Running CFA for country: Uganda 
## Running CFA for country: Cambodia 
## Running CFA for country: Indonesia 
## Running CFA for country: Malaysia 
## Running CFA for country: Philippines 
## Running CFA for country: Thailand 
## Running CFA for country: Vietnam

Check if all countries are positive definite. In fact, no countries are positive definite, which is problematic.

for(c in unique_countries) {
  country_data <- inv_df[inv_df$COUNTRY == country, ]
  cov_matrix <- cov(country_data[, c("abuse_a", "abuse_b", "abuse_c", "abuse_d", "abuse_e", "abuse_f", 
"abuse_g", "abuse_h", "abuse_i")], use = "pairwise.complete.obs")
  print(paste("Country:", c, "is positive definite (T/F) =", is.positive.definite(cov_matrix)))
}
## [1] "Country: Ethiopia is positive definite (T/F) = FALSE"
## [1] "Country: Kenya is positive definite (T/F) = FALSE"
## [1] "Country: Mozambique is positive definite (T/F) = FALSE"
## [1] "Country: Namibia is positive definite (T/F) = FALSE"
## [1] "Country: Tanzania is positive definite (T/F) = FALSE"
## [1] "Country: Uganda is positive definite (T/F) = FALSE"
## [1] "Country: Cambodia is positive definite (T/F) = FALSE"
## [1] "Country: Indonesia is positive definite (T/F) = FALSE"
## [1] "Country: Malaysia is positive definite (T/F) = FALSE"
## [1] "Country: Philippines is positive definite (T/F) = FALSE"
## [1] "Country: Thailand is positive definite (T/F) = FALSE"
## [1] "Country: Vietnam is positive definite (T/F) = FALSE"
# thus apply a log transform
country_data <- inv_df[inv_df$COUNTRY == "Vietnam", ]

Will try to log transform the data. Given that we have established (above), that there is extreme zero skew in the data (for all items), it is reasonable to apply a log transform.

Pre:transform statistics

# Specify the columns to transform
columns_to_transform <- c("abuse_a", "abuse_b", "abuse_c", "abuse_d", "abuse_e", "abuse_f", "abuse_g", "abuse_h", "abuse_i")

# pre transform statistics
summary(inv_df[columns_to_transform])
##     abuse_a        abuse_b         abuse_c         abuse_d         abuse_e     
##  Min.   :1.00   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.00   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :1.00   Median :1.000   Median :1.000   Median :1.000   Median :1.000  
##  Mean   :1.25   Mean   :1.266   Mean   :1.149   Mean   :1.134   Mean   :1.115  
##  3rd Qu.:1.00   3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:1.000  
##  Max.   :5.00   Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##     abuse_f         abuse_g         abuse_h         abuse_i     
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :1.000   Median :1.000   Median :1.000   Median :1.000  
##  Mean   :1.086   Mean   :1.091   Mean   :1.089   Mean   :1.082  
##  3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:1.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000
# save pre-transform
inv_df_pre <- inv_df

# Apply log transformation using lapply
inv_df[columns_to_transform] <- lapply(inv_df[columns_to_transform], function(x) log(x + 1))

Post-transform statistics:

# post transform statistics
summary(inv_df[columns_to_transform])
##     abuse_a          abuse_b          abuse_c          abuse_d      
##  Min.   :0.6931   Min.   :0.6931   Min.   :0.6931   Min.   :0.6931  
##  1st Qu.:0.6931   1st Qu.:0.6931   1st Qu.:0.6931   1st Qu.:0.6931  
##  Median :0.6931   Median :0.6931   Median :0.6931   Median :0.6931  
##  Mean   :0.7816   Mean   :0.7857   Mean   :0.7457   Mean   :0.7402  
##  3rd Qu.:0.6931   3rd Qu.:0.6931   3rd Qu.:0.6931   3rd Qu.:0.6931  
##  Max.   :1.7918   Max.   :1.7918   Max.   :1.7918   Max.   :1.7918  
##     abuse_e          abuse_f          abuse_g          abuse_h      
##  Min.   :0.6931   Min.   :0.6931   Min.   :0.6931   Min.   :0.6931  
##  1st Qu.:0.6931   1st Qu.:0.6931   1st Qu.:0.6931   1st Qu.:0.6931  
##  Median :0.6931   Median :0.6931   Median :0.6931   Median :0.6931  
##  Mean   :0.7330   Mean   :0.7228   Mean   :0.7241   Mean   :0.7237  
##  3rd Qu.:0.6931   3rd Qu.:0.6931   3rd Qu.:0.6931   3rd Qu.:0.6931  
##  Max.   :1.7918   Max.   :1.7918   Max.   :1.7918   Max.   :1.7918  
##     abuse_i      
##  Min.   :0.6931  
##  1st Qu.:0.6931  
##  Median :0.6931  
##  Mean   :0.7212  
##  3rd Qu.:0.6931  
##  Max.   :1.7918
inv_df |>
  tidyr::pivot_longer(cols=starts_with("abuse_"),
                    names_to="var",
                    values_to="val") |>
  ggplot(aes(x=val)) +
  geom_bar(fill = "skyblue", color = "black") +
  facet_wrap(~var, scales="free_y") +
  labs(
    x="Likert Ordinal Scale Response (1-5)",
    y="Frequency",
    caption="Note. Ordinal scale responses range from 1 to 5:\n
    (1) Never (2) Rarely (3) Sometimes (4) Always (5) Often"
  ) +
  theme_minimal()

Note that residual invariance did not hold and was unable to be computed.

inv_df$COUNTRY <- as.character(inv_df$COUNTRY)

# remove problematic countries
inv_df1 <- inv_df |>
  dplyr::select(COUNTRY, contains('abuse_')) |>
  dplyr::filter(!(COUNTRY %in% c('Vietnam')))

# specify model
# abuse.model <- "A =~ abuse_f + abuse_g + abuse_h + abuse_i
#                 B =~ abuse_a + abuse_b + abuse_c + abuse_d + abuse_e"

# i) Configural model
abuse.model %>% lavaan::cfa(
  # data_cfa %>% remove_labels(),
          labelled::remove_labels(inv_df1),
          effect.coding = T,
          meanstructure = T,
          missing = "pairwise",
          group = "COUNTRY",
          estimator = "MLR") -> fit.tp_sex_configural.cfa

# ii) Metric model
abuse.model %>% lavaan::cfa(
  # data_cfa %>% remove_labels(),
          labelled::remove_labels(inv_df1),
          effect.coding = T,
          meanstructure = T,
          missing = "pairwise",
          group = "COUNTRY",
          estimator = "MLR",
          group.equal = "loadings") -> fit.tp_sex_weak.cfa

## iii) Scalar model
abuse.model %>% lavaan::cfa(
  # data_cfa %>% remove_labels(),
          labelled::remove_labels(inv_df1), 
          effect.coding = T,
          meanstructure = T,
          missing = "pairwise",
          group = "COUNTRY",
          estimator = "MLR",
          group.equal = c("loadings", "intercepts")) -> fit.tp_sex_strong.cfa

# iv) Residual model
abuse.model %>% lavaan::cfa(
  # data_cfa %>% remove_labels(),
          labelled::remove_labels(inv_df1),
          effect.coding = T,
          missing = "pairwise",
          meanstructure = T,
          group = "COUNTRY",
          estimator = "MLR",
          group.equal = c("loadings", "intercepts")) -> fit.tp_sex_strict.cfa
          # group.equal = c("loadings", "intercepts", "residuals")) -> fit.tp_sex_strict.cfa

Instead try bayesian methods?

fit_configural <- bsem(
  model = abuse.model,
  data = labelled::remove_labels(inv_df),
  group = "COUNTRY",
  meanstructure = TRUE,
  missing = "pairwise"
)

4.3.1 Results

Intepretation of results:

  1. Configural vs metric (weak) model: If the chi-square difference test between the configural and metric model is non-significant (p > 0.05), it suggests that constraining the loadings to be equal across groups does not significantly worsen the model fit. This supports metric invariance.

  2. Metric (Weak) vs. Scalar (Strong) model: If the chi-square difference test between the metric and scalar model is non-significant (p > 0.05), it indicates that constraining both the loadings and intercepts to be equal across groups does not significantly worsen the model fit. This supports scalar invariance.

  3. Scalar (Strong) vs. Residual (Strict) Model: If the chi-square difference test between the scalar and residual model is non-significant (p > 0.05), it suggests that constraining the residual variances to be equal across groups does not significantly worsen the model fit

# Pairwise comparisons using anova function
anova_configural_metric <- anova(fit.tp_sex_configural.cfa, fit.tp_sex_weak.cfa)
anova_metric_scalar <- anova(fit.tp_sex_weak.cfa, fit.tp_sex_strong.cfa)
anova_scalar_residual <- anova(fit.tp_sex_strong.cfa, fit.tp_sex_strict.cfa)

# Print pairwise comparison results
print(anova_configural_metric)
## 
## Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
## 
## lavaan NOTE:
##     The "Chisq" column contains standard test statistics, not the
##     robust test that should be reported per model. A robust difference
##     test is a function of two standard (not robust) statistics.
##  
##                            Df     AIC     BIC  Chisq Chisq diff Df diff
## fit.tp_sex_configural.cfa 286 -123234 -121027 3496.3                   
## fit.tp_sex_weak.cfa       356 -122153 -120448 4717.3     145.44      70
##                           Pr(>Chisq)    
## fit.tp_sex_configural.cfa               
## fit.tp_sex_weak.cfa        3.228e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(anova_metric_scalar)
## 
## Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
## 
## lavaan NOTE:
##     The "Chisq" column contains standard test statistics, not the
##     robust test that should be reported per model. A robust difference
##     test is a function of two standard (not robust) statistics.
##  
##                        Df     AIC     BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## fit.tp_sex_weak.cfa   356 -122153 -120448 4717.3                              
## fit.tp_sex_strong.cfa 426 -121790 -120587 5220.2      486.5      70  < 2.2e-16
##                          
## fit.tp_sex_weak.cfa      
## fit.tp_sex_strong.cfa ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(anova_scalar_residual)
## 
## Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
## 
## lavaan NOTE:
##     The "Chisq" column contains standard test statistics, not the
##     robust test that should be reported per model. A robust difference
##     test is a function of two standard (not robust) statistics.
##  
##                        Df     AIC     BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## fit.tp_sex_strong.cfa 426 -121790 -120587 5220.2                              
## fit.tp_sex_strict.cfa 426 -121790 -120587 5220.2          0       0
# Extract fit indices for each model
fit_indices_configural <- fitMeasures(fit.tp_sex_configural.cfa, c("cfi", "tli", "rmsea", "srmr"))
fit_indices_metric <- fitMeasures(fit.tp_sex_weak.cfa, c("cfi", "tli", "rmsea", "srmr"))
fit_indices_scalar <- fitMeasures(fit.tp_sex_strong.cfa, c("cfi", "tli", "rmsea", "srmr"))
fit_indices_residual <- fitMeasures(fit.tp_sex_strict.cfa, c("cfi", "tli", "rmsea", "srmr"))

# Calculate fit indices differences
cfi_diff_metric_scalar <- fitmeasures(fit.tp_sex_weak.cfa)["cfi"] - fitmeasures(fit.tp_sex_strong.cfa)["cfi"]
tli_diff_metric_scalar <- fitmeasures(fit.tp_sex_weak.cfa)["tli"] - fitmeasures(fit.tp_sex_strong.cfa)["tli"]
rmsea_diff_metric_scalar <- fitmeasures(fit.tp_sex_weak.cfa)["rmsea"] - fitmeasures(fit.tp_sex_strong.cfa)["rmsea"]
srmr_diff_metric_scalar <- fitmeasures(fit.tp_sex_weak.cfa)["srmr"] - fitmeasures(fit.tp_sex_strong.cfa)["srmr"]

cfi_diff_scalar_residual <- fitmeasures(fit.tp_sex_strong.cfa)["cfi"] - fitmeasures(fit.tp_sex_strict.cfa)["cfi"]
tli_diff_scalar_residual <- fitmeasures(fit.tp_sex_strong.cfa)["tli"] - fitmeasures(fit.tp_sex_strict.cfa)["tli"]
rmsea_diff_scalar_residual <- fitmeasures(fit.tp_sex_strong.cfa)["rmsea"] - fitmeasures(fit.tp_sex_strict.cfa)["rmsea"]
srmr_diff_scalar_residual <- fitmeasures(fit.tp_sex_strong.cfa)["srmr"] - fitmeasures(fit.tp_sex_strict.cfa)["srmr"]

# Create a data frame with the results
results_sex <- data.frame(
  Comparison = c("Configural vs Metric", "Metric vs Scalar", "Scalar vs Residual"),
  Chi_Square_Diff = c(anova_configural_metric$Chisq[2], anova_metric_scalar$Chisq[2], anova_scalar_residual$Chisq[2]),
  DF_Diff = c(anova_configural_metric$Df[2], anova_metric_scalar$Df[2], anova_scalar_residual$Df[2]),
  p_value = c(anova_configural_metric$`Pr(>Chisq)`[2], anova_metric_scalar$`Pr(>Chisq)`[2], anova_scalar_residual$`Pr(>Chisq)`[2]),
  CFI = c(fit_indices_configural["cfi"], cfi_diff_metric_scalar, cfi_diff_scalar_residual),
  TLI = c(fit_indices_configural["tli"], tli_diff_metric_scalar, tli_diff_scalar_residual),
  RMSEA = c(fit_indices_configural["rmsea"], rmsea_diff_metric_scalar, rmsea_diff_scalar_residual),
  SRMR = c(fit_indices_configural["srmr"], srmr_diff_metric_scalar, srmr_diff_scalar_residual)
)

# output model comparison indices
results_sex |>
  kbl() |>
  kable_styling(full_width = F)
Comparison Chi_Square_Diff DF_Diff p_value CFI TLI RMSEA SRMR
Configural vs Metric 4717.337 356 3e-07 0.9281332 0.9004922 0.1136520 0.0499795
Metric vs Scalar 5220.178 426 0e+00 0.0096898 -0.0088386 0.0049339 -0.0035725
Scalar vs Residual 5220.178 426 NA 0.0000000 0.0000000 0.0000000 0.0000000
# Compare the models
anova_results_sex <- anova(fit.tp_sex_configural.cfa, fit.tp_sex_weak.cfa, fit.tp_sex_strong.cfa, fit.tp_sex_strict.cfa)

# Print the comparison results
print(anova_results_sex)
## 
## Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
## 
## lavaan NOTE:
##     The "Chisq" column contains standard test statistics, not the
##     robust test that should be reported per model. A robust difference
##     test is a function of two standard (not robust) statistics.
##  
##                            Df     AIC     BIC  Chisq Chisq diff Df diff
## fit.tp_sex_configural.cfa 286 -123234 -121027 3496.3                   
## fit.tp_sex_weak.cfa       356 -122153 -120448 4717.3     145.44      70
## fit.tp_sex_strong.cfa     426 -121790 -120587 5220.2     486.50      70
## fit.tp_sex_strict.cfa     426 -121790 -120587 5220.2       0.00       0
##                           Pr(>Chisq)    
## fit.tp_sex_configural.cfa               
## fit.tp_sex_weak.cfa        3.228e-07 ***
## fit.tp_sex_strong.cfa      < 2.2e-16 ***
## fit.tp_sex_strict.cfa                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova_results_sex |>
  kbl() |>
  kable_styling(full_width = F)
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.tp_sex_configural.cfa 286 -123234.3 -121027.4 3496.259 NA NA NA
fit.tp_sex_weak.cfa 356 -122153.2 -120447.9 4717.337 145.4403 70 3e-07
fit.tp_sex_strong.cfa 426 -121790.4 -120586.6 5220.178 486.5017 70 0e+00
fit.tp_sex_strict.cfa 426 -121790.4 -120586.6 5220.178 0.0000 0 NA
# tabulate additional fit indices
fit_indices_sex <- data.frame(
  Model = c("Configural", "Metric", "Scalar", "Residual"),
  CFI = c(
    fitMeasures(fit.tp_sex_configural.cfa, "cfi"),
    fitMeasures(fit.tp_sex_weak.cfa, "cfi"),
    fitMeasures(fit.tp_sex_strong.cfa, "cfi"),
    fitMeasures(fit.tp_sex_strict.cfa, "cfi")
  ),
  TLI = c(
    fitMeasures(fit.tp_sex_configural.cfa, "tli"),
    fitMeasures(fit.tp_sex_weak.cfa, "tli"),
    fitMeasures(fit.tp_sex_strong.cfa, "tli"),
    fitMeasures(fit.tp_sex_strict.cfa, "tli")
  ),
  RMSEA = c(
    fitMeasures(fit.tp_sex_configural.cfa, "rmsea"),
    fitMeasures(fit.tp_sex_weak.cfa, "rmsea"),
    fitMeasures(fit.tp_sex_strong.cfa, "rmsea"),
    fitMeasures(fit.tp_sex_strict.cfa, "rmsea")
  )
)

# output fit indices
fit_indices_sex |>
  kbl() |>
  kable_styling(full_width = F)
Model CFI TLI RMSEA
Configural 0.9281332 0.9004922 0.113652
Metric 0.9023645 0.8913942 0.118734
Scalar 0.8926747 0.9002328 0.113800
Residual 0.8926747 0.9002328 0.113800

4.3.2 Latent means plot

lavaan::parameterestimates(fit.tp_sex_strict.cfa) %>% 
  filter(grepl("abuse_.$", lhs) & op == "~1") %>% 
  ggplot(aes(x = lhs, 
             y = est, 
             group = as.factor(group), 
             color = as.factor(group), 
             ymin = ci.lower, 
             ymax = ci.upper)) +
  geom_point(size = 3, alpha = 0.7) +
  geom_errorbar(width = 0.3, position = position_dodge(width = 0.5), alpha = 0.7) +
  geom_line(size = 1, position = position_dodge(width = 0.5), alpha = 0.7) +
  # scale_color_brewer(palette = "Set1", labels = c("Female", "Male")) +
  scale_color_lancet(labels=c("Female", "Male")) +
  labs(x = "Subscale", y = "Latent mean", color = "Sex", 
       title = "Latent Means by Sex") +
  # scale_x_discrete(labels = c("Authoritarianism", 
  #                             "Mediation", 
  #                             "Monitoring by Proxy",
  #                             "Boundary Negotiation",
  #                             "Technological Monitoring",
  #                             "Co-Use")) +
  theme_minimal() +
  theme(
    legend.position = "top",
    legend.title = element_text(size = 12, face = "bold"),
    legend.text = element_text(size = 10),
    axis.text.x = element_text(angle = 45, hjust = 1, size = 10, color = "black"),
    axis.text.y = element_text(size = 10, color = "black"),
    axis.title = element_text(size = 12, color = "black"),
    plot.title = element_text(size = 14, hjust = 0.5, face = "bold"),
    plot.subtitle = element_text(size = 12, hjust = 0.5)
  )

5 Item Response Theory (IRT)

IRT as classical test theory: Weighted least squares for ordered predictors.

Treating scales as ordered categories will allow for IRT to be implemented. This will be achieved using polytomous item response theory and graded itemtype corresponding to the graded repsonse model (GRM). The GRM estimates thresholds between response categories, as well as discrimination parameters for each item.

set.seed(seed_value) # set global seed

irt_df <- inv_df_pre |>
  dplyr::select(contains("abuse_")) |>
  drop_na()

# define mirt model
irt.model <- mirt::mirt.model('
  F1=6-9 # Items 6 to 9 load on Factor 1
  F2=1-5 # Items 1 to 5 load on Factor 2
  COV = F1*F2 # Allow F1 and F2 to covary
')

# confirm IRT model
print(irt.model)
## $x
##      Type  Parameters
## [1,] "F1"  "6-9"     
## [2,] "F2"  "1-5"     
## [3,] "COV" "F1*F2"   
## 
## attr(,"class")
## [1] "mirt.model"
# run polytomous IRT
# monopoly for dichotomos and polytomous response data
poly.model <- mirt::mirt(data=irt_df,
                         model=irt.model,
                         itemtype='graded',
                         # itemtype='gpcm',
                         verbose = FALSE,
                         SE = TRUE)

5.0.1 Loadings

Treat loadings as factor coefficients. All items appear to load onto their respective factors.

All ‘a’ values are also high for respective factors, suggesting all items are good for discrimination.

# Summary of the model (item parameters, factor loadings, etc.)
summary(poly.model)
##            F1    F2    h2
## abuse_a 0.000 0.782 0.612
## abuse_b 0.000 0.818 0.668
## abuse_c 0.000 0.909 0.826
## abuse_d 0.000 0.930 0.865
## abuse_e 0.000 0.926 0.857
## abuse_f 0.938 0.000 0.879
## abuse_g 0.949 0.000 0.900
## abuse_h 0.913 0.000 0.834
## abuse_i 0.922 0.000 0.849
## 
## SS loadings:  3.463 3.827 
## Proportion Var:  0.385 0.425 
## 
## Factor correlations: 
## 
##       F1 F2
## F1 1.000   
## F2 0.912  1
# Extract IRT parameters (coefficients)
coef_table <- coef(poly.model, IRTpars = TRUE)

knitr::kable(coef_table, caption = "IRT Parameter Estimates")
IRT Parameter Estimates
a1 a2 b1 b2 b3 b4
par 0 2.135986 1.345068 1.852152 2.829305 3.670974
CI_2.5 NA 2.009188 1.296118 1.784350 2.706831 3.461265
CI_97.5 NA 2.262784 1.394019 1.919954 2.951778 3.880684
a1 a2 b1 b2 b3 b4
par 0 2.416153 1.282899 1.697920 2.553163 3.112983
CI_2.5 NA 2.273938 1.238889 1.640191 2.453535 2.970263
CI_97.5 NA 2.558368 1.326910 1.755648 2.652792 3.255703
a1 a2 b1 b2 b3 b4
par 0 3.705665 1.492956 1.921919 2.540201 2.985046
CI_2.5 NA 3.439300 1.449819 1.862843 2.446996 2.851279
CI_97.5 NA 3.972030 1.536094 1.980994 2.633407 3.118813
a1 a2 b1 b2 b3 b4
par 0 4.306734 1.522475 1.928306 2.457446 2.959799
CI_2.5 NA 3.964745 1.480073 1.870624 2.371332 2.830114
CI_97.5 NA 4.648722 1.564877 1.985988 2.543561 3.089484
a1 a2 b1 b2 b3 b4
par 0 4.161242 1.639946 1.987113 2.519888 2.952028
CI_2.5 NA 3.826686 1.593444 1.925910 2.428840 2.823856
CI_97.5 NA 4.495797 1.686446 2.048316 2.610937 3.080199
a1 a2 b1 b2 b3 b4
par 4.597012 0 1.794987 2.101703 2.628392 3.058580
CI_2.5 4.174579 NA 1.746046 2.040227 2.534514 2.920307
CI_97.5 5.019445 NA 1.843929 2.163179 2.722271 3.196852
a1 a2 b1 b2 b3 b4
par 5.109723 0 1.766311 2.057543 2.501754 2.920536
CI_2.5 4.616550 NA 1.719593 1.999775 2.417920 2.800280
CI_97.5 5.602896 NA 1.813028 2.115311 2.585587 3.040791
a1 a2 b1 b2 b3 b4
par 3.809791 0 1.831825 2.162952 2.605608 2.971778
CI_2.5 3.492278 NA 1.778549 2.094413 2.508625 2.841274
CI_97.5 4.127305 NA 1.885102 2.231490 2.702591 3.102283
a1 a2 b1 b2 b3 b4
par 4.043520 0 1.863182 2.199260 2.582683 2.955683
CI_2.5 3.691858 NA 1.809585 2.130006 2.489281 2.828158
CI_97.5 4.395183 NA 1.916780 2.268515 2.676086 3.083209
MEAN_1 MEAN_2 COV_11 COV_21 COV_22
par 0 0 1 0.9121076 1
CI_2.5 NA NA NA 0.8987043 NA
CI_97.5 NA NA NA 0.9255109 NA

5.1 Trace curves

5.1.1 Trace Plot (Category Response Curves)

# itemplot(poly.model, 1, type='info')
# itemplot(poly.model, 2, type='trace')

# Plotting item characteristic curves (ICC) for visualization
plot(poly.model, type = 'trace')

This 3D plot represents the Test Information Function (TIF) for a multidimensional item response theory (IRT) model.

  • Peak of the Information Function: The peak (light cyan color, ~20 units) indicates the range of trait values (θ₁, θ₂) where the test is most informative. In this case, it looks like the test provides the most information when both θ₁ and θ₂ are around zero (centered on the latent trait dimensions). This suggests that the test is most precise for individuals with average levels of the latent traits.

  • Lower Information at Extremes: The flat areas with low information (purple, close to 0 on the I(θ) scale) indicate that the test provides little information at extreme values of θ₁ and θ₂ (very high or low levels of the traits). In other words, the test may not be as reliable for individuals with very high or very low abilities on these dimensions

  • Interpretation in Multidimensional Context: Since this is a multidimensional IRT model, the information is spread across two latent traits (θ₁ and θ₂). The test is designed to measure abilities on these two dimensions, and the plot shows how much information the test provides for various combinations of these traits. For example, the test is most informative for respondents whose traits fall in the middle (θ₁ ≈ 0, θ₂ ≈ 0), and less so for those who are extreme in one or both traits (θ₁ ≈ ±6, θ₂ ≈ ±6).

SUMMARY: This plot shows the Test Information Function (TIF) for your multidimensional item response theory (IRT) model, illustrating how informative your test is across two latent traits (θ₁ and θ₂). The test provides the most information around average levels of these traits (θ₁ ≈ 0, θ₂ ≈ 0), meaning it is most precise for individuals whose abilities or trait levels are near the mean. At the extremes of the traits (very high or very low values of θ₁ and θ₂), the test provides much less information, indicating that it may be less reliable for those respondents. This suggests that your test is well-suited for measuring average respondents but may require additional items or modifications to better capture information at the extremes.

5.1.2 Test Information Function Plot

# Factor loadings plot
plot(poly.model, type = 'info')

5.2 Other Diagnostic Plots

5.2.1 Test Characteristic Curve

plot(poly.model, type = 'score')

5.2.2 Item Information Plot

plot(poly.model, type = 'infotrace')

5.3 Model fit statistics

# Compute model fit statistics using M2 function
fit_indices <- itemfit(poly.model)

# Get log-likelihood, AIC, and BIC
logLik(poly.model)
## [1] -25623.78
anova(poly.model)
##                 AIC    SABIC       HQ      BIC    logLik
## poly.model 51339.56 51527.29 51452.32 51673.47 -25623.78

6 Factor descriptives

Individuals are coded as presence of a factor if there are any responses across items within the factor above the minimum Likert value.

6.1 Main: 2-factor solution

  • F1: “abuse_f”, “abuse_g”, “abuse_h”, “abuse_i”
  • F2: “abuse_a”, “abuse_b”, “abuse_c”, “abuse_d”, “abuse_e”.

Plot descriptives by presence or absence of each factor.

fct2_desc <- inv_df
fct2_desc$f1_sum <- rowSums(inv_df[, c("abuse_f", "abuse_g", "abuse_h", "abuse_i")])
fct2_desc$f2_sum <- rowSums(inv_df[, c("abuse_a", "abuse_b", "abuse_c", "abuse_d", "abuse_e")])


fct2_desc$f1_bin <- ifelse(fct2_desc$f1_sum > 4, 1, 0)
fct2_desc$f2_bin <- ifelse(fct2_desc$f2_sum > 5, 1, 0)

# fct2_desc$f1_bin <- fct2_desc$f1_sum > 4  # TRUE if sum > 4
# fct2_desc$f2_bin <- fct2_desc$f2_sum > 5  # TRUE if sum > 5

# Create a list of sets for the UpSet plot
binary_data <- data.frame(
  Factor1 = as.integer(fct2_desc$f1_bin),
  Factor2 = as.integer(fct2_desc$f2_bin)
)

UpSetR::upset(binary_data, 
      sets = c("Factor1", "Factor2"),
      order.by = "freq",
      main.bar.color = "#0073C2FF",  # Main bar color
      sets.bar.color = "#D55E00",     # Sets bar color
      matrix.color = "#009E73" # Color for intersections
)

ven1 <- ggVennDiagram(
  list(
    F1 = which(fct2_desc$f1_bin == 1),
    F2 = which(fct2_desc$f2_bin == 1)
  ),
  label_alpha = 0.5,
  fill = c("lightblue", "lightgreen")
) +
  scale_fill_distiller(name='Frequency', 'Blues', direction=1)
  # scale_fill_distiller('Blues', direction=1)

ven1

# ggVennDiagram(
#   list(
#     F1 = which(fct2_desc$f1_bin == 1),
#     F2 = which(fct2_desc$f2_bin == 1)
#   ),
#   force_upset = TRUE
# )

6.2 Sub-optimal: 3-factor solution

  • F1: “abuse_f”, “abuse_g”, “abuse_h”, “abuse_i”
  • F2: “abuse_c”, “abuse_d”, “abuse_e”
  • F3: “abuse_a”, “abuse_b”
fct3_desc <- inv_df
fct3_desc$f1_sum <- rowSums(inv_df[, c("abuse_f", "abuse_g", "abuse_h", "abuse_i")])
fct3_desc$f2_sum <- rowSums(inv_df[, c("abuse_c", "abuse_d", "abuse_e")])
fct3_desc$f3_sum <- rowSums(inv_df[, c("abuse_a", "abuse_b")])

fct3_desc$f1_bin <- ifelse(fct3_desc$f1_sum > 4, 1, 0)
fct3_desc$f2_bin <- ifelse(fct3_desc$f2_sum > 3, 1, 0)
fct3_desc$f3_bin <- ifelse(fct3_desc$f3_sum > 2, 1, 0)

# Create a list of sets for the UpSet plot
binary_data <- data.frame(
  Factor1 = as.integer(fct3_desc$f1_bin),
  Factor2 = as.integer(fct3_desc$f2_bin),
  Factor3 = as.integer(fct3_desc$f3_bin)
)

UpSetR::upset(binary_data, 
      sets = c("Factor1", "Factor2", "Factor3"),
      order.by = "freq",
      main.bar.color = "#0073C2FF",  # Main bar color
      sets.bar.color = "#D55E00",     # Sets bar color
      matrix.color = "#009E73" # Color for intersections
)

ven2<-ggVennDiagram(
  list(
    F1 = which(fct3_desc$f1_bin == 1),
    F2 = which(fct3_desc$f2_bin == 1),
    F3 = which(fct3_desc$f3_bin == 1)
  ),
  label_alpha = 0.5,
  fill = c("lightblue", "lightgreen")
) +
  scale_fill_distiller(name='Frequency', 'Blues', direction=1)

ven2

7 R Session Info

Output R session information for reproducibility. This is a custom function.

print_system_info()
## System Datetime: 2024-10-01 17:07:03 
## R Version: R version 4.2.2 (2022-10-31 ucrt) 
## Operating System: Windows 10 x64 (build 22000) 
## 
## Loaded Packages:
## UpSetR : 1.4.0 
## ggVennDiagram : 1.5.2 
## mice : 3.16.0 
## matrixcalc : 1.0-6 
## caret : 6.0-94 
## naniar : 1.1.0 
## sjPlot : 2.8.16 
## data.table : 1.15.2 
## DT : 0.32 
## ggcorrplot : 0.1.4.1 
## table1 : 1.4.3 
## ggsci : 3.0.3 
## kableExtra : 1.4.0 
## sjlabelled : 1.2.0 
## openxlsx : 4.2.5.2 
## splitstackshape : 1.4.8 
## fabricatr : 1.0.2 
## labelled : 2.13.0 
## mirt : 1.41 
## lattice : 0.20-45 
## semPlot : 1.1.6 
## semTools : 0.5-6 
## lavaan : 0.6-17 
## psych : 2.4.3 
## haven : 2.5.4 
## lubridate : 1.9.3 
## forcats : 1.0.0 
## stringr : 1.5.1 
## dplyr : 1.1.4 
## purrr : 1.0.2 
## readr : 2.1.5 
## tidyr : 1.3.1 
## tibble : 3.2.1 
## ggplot2 : 3.5.0 
## tidyverse : 2.0.0