An R package for analyzing ILSAs’ data
By Plamen Vladkov Mirazchiyski, International Educational Research and Evaluation Institute (INERI), Slovenia
Several statistical packages are available for analyzing data stemming from international large-scale assessments and surveys (ILSAs). The first available and most commonly used package is the IEA IDB Analyzer (IEA, 2020). Free (i.e. not proprietary) and open-source tools like the R packages ‘intsvy’ (Caro & Biecek, 2019), ‘BIFIEsurvey’ (BIFIE et al., 2019) and ‘EdSurvey’ (Bailey et al., 2020) are available as well. All of these packages can analyze ILSAs’ data, handling the statistical issues stemming from the complex sampling and assessment designs the studies have. The R Analyzer for Large-Scale Assessments (‘RALSA’) (Mirazchiyski & INERI, 2021) is a new package which came to life in November 2020. It was built for user experience and is suitable even for analysts having no prior experience with R. It supports data from all cycles of a broad range of studies: CivED, ICCS, ICILS, RLII, PIRLS (including PIRLS Literacy and ePIRLS), TIMSS (including TIMSS Numeracy and eTIMSS), TiPi (TIMSS and PIRLS joint study), TIMSS Advanced, SITES, TEDS-M, PISA, PISA for Development, TALIS, and TALIS Starting Strong Survey (also known as TALIS 3S). The functionalities of the package are organized into two sets: data preparation and data analysis. The first and most important data preparation feature of ‘RALSA’ is that it converts data from SPSS into native R data sets for further use (data preparation or analysis). It can also convert PISA data prior its 2015 cycle where the data sets are not provided in SPSS file format, but as text files with an SPSS control syntax. While converting the data into native R data sets, it attaches the study name, cycle and respondent type as an attribute to the data. These are later used by all other data preparation and analysis functions. Unlike the usual R handling of missing data, ‘RALSA’ attaches the user-defined missing values to each variable. ‘RALSA’ also brings its own variable recoding function which handles the user-defined missing values. Another important data preparation function is the variable codebook which prints or saves a codebook for all variables or just for the ones selected by the user. ‘RALSA’ can also merge data from different respondent types while preventing merging data from respondents which shall not be merged depending on a study’s design (e.g. merging student and teacher data in ICCS and ICILS vs. TIMSS and PIRLS) while taking care for the user-defined missing values.
The second set of functions are the analysis functions. Currently, ‘RALSA’ supports the following analysis types: percentages and means, percentiles, benchmarks/proficiency levels, correlations (Pearson or Spearman), linear regression, binary logistic regression. The number analysis types will grow in future, and new features will be added to the aforementioned ones.
The most distinctive features, compared to other R packages, are the output export and the graphical user interface (GUI). The analysis functions export MS Excel workbook with separate sheets for estimates, model statistics, analysis information, and the syntax used for the analysis which can be used to replicate or repeat the computations if the data are updated. ‘RALSA’ is the only R package so far which comes with a GUI. The GUI is written entirely in R using the ‘shiny’ package (Chang et al., 2021). The GUI is a web application which runs locally in the default browser on the computer, regardless which is the operating system (MS Windows, MacOS, Linux or any other where R can be installed).
Regardless whether the analyst uses the traditional R command-line mode or the GUI, ‘RALSA’ will automatically recognize the study data and the available respondents and will apply all routines to handle the pertinent complex sampling and assessment designs for that study and respondents. The examples below use TIMSS 2019 grade 4 data and demonstrate how to use ‘RALSA’ via the command-line mode, although every single feature is available through the GUI. The examples focus on the most important details, and not on every feature each function has. The ‘RALSA’ support website (INERI, 2020) provides extensive details and examples for all its features. The very first step in using the package is to convert the data into native R data sets:
lsa.convert.data(inp.folder = "C:/TIMSS_2019_G4_Data", out.folder = "C:/Converted")
The ‘lsa.convert.data’ function will automatically recognize the study and cycle the data stems from, convert each separate file (adding all important attributes to the data set and variables, see above), and save each file as ‘.RData’ set in the output directory under the same file names. While the syntax from above will convert data for all countries in the folder, data only for a set of countries can be converted as well. Here is how the student and teacher data files (converted in the previous step) for Australia, Canada, Chile, Italy, Japan, Qatar, and South Africa can be merged:
lsa.merge.data(inp.folder = "C:/Converted", ISO = c("aus", "can", "chl", "ita", "jpn", "qat", "zaf"), file.types = list(asg = NULL, atg = NULL), out.file = "C:/Merged/TIMSS_2019_G4_ASG_ATG.RData")
If the `ISO` argument is omitted, then the student and teacher data for all countries available in the folder will be merged. Note the ‘file.types’ argument, it specifies which respondent types (‘asg’ for student background and ‘atg’ for teacher background) will be merged. ‘NULL’ in this case means “take all variables”; if a vector of variable names is provided instead, only the selected variables will be taken instead. Also, because teacher data is merged to the student ones, the student teacher-linkage file will be automatically added in the process to ensure students and teachers are matched properly, and only the design variables pertinent to this merge combination will be added. Now the merged student and teacher data can be used in an analysis. The following computes the average overall mathematics achievement of students whose teachers are at different ages (under 25, 25-29, 30-39, 40-49, 50-59, and 60 or more):
lsa.pcts.means(data.file = "C:/Merged/TIMSS_2019_G4_ASG_ATG.RData", split.vars = "ATBG03", PV.root.avg = "ASMMAT")
Note that it was necessary to specify only the root of the plausible values (PVs) for the five PVs of overall mathematics achievement; the ‘lsa.pcts.means’ function includes automatically all five in the computations. Also, it was not necessary to specify weight variable (although it can be), the mathematics teacher weights are selected automatically. The path to the output file name was also not specified (although it can be) and, thus, the function saves ‘Analysis.xlsx’ file in the working directory. The output is opened automatically (this can be suppressed via the ‘open.output’ argument) in the default spreadsheet application.
The limited examples from above are just quick demonstration of the capabilities of the ‘RALSA’ package. The entire package is created with focus on the user experience. All functions have unified arguments which eases the analysts. All of the computational routines are applied automatically with specified defaults which, if the analyst wishes, can be changed. The package performs automatic checks and stops with human readable error messages if the user is willing to do things which violate the design of a particular study, or certain arguments are misspecified. To get started and obtain detailed information on the ‘RALSA’ functionality, visit the help section on the dedicated website (http://ralsa.ineri.org/user-guide/). If interested in news related to ‘RALSA’, subscribe to the mailing list by sending an email to email@example.com.
Bailey, P., Emad, A., Huo, H., Lee, M., Liao, Y., Lishinski, A., Nguyen, T., Xie, Q., Yu, J., & Zhang, T. (2020). EdSurvey: Analysis of NCES Education Survey and Assessment Data [Computer software manual]. Retrieved from https://cran.r-project.org/package=EdSurvey (R package version 2.6.1).
BIFIE, Robitzsch, A., & Oberwimmer, K. (2019). BIFIEsurvey: Tools for Survey Statistics in Educational Assessment [Computer software manual]. Retrieved from https://cran.r-project.org/package=BIFIEsurvey (R package version 3.3-12).
Caro, D., & Biecek, P. (2019). intsvy: International Assessment Data Manager [Computer software manual]. Retrieved from https://cran.r-project.org/package=intsvy (R package version 2.4).
Chang, W., Cheng, J., Allaire, J. J., & Xie, J., Y. McPherson. (2021). shiny: Web Application Framework for R [Computer software manual]. Retrieved from https://cran.r-project.org/package=shiny (R package version 1.6.0).
IEA. (2020). IDB Analyzer [Computer software manual]. Retrieved from https://www.iea.nl/data-tools/tools#section-308 (Version 4.0.39).
INERI. (2020). RALSA: R Analyzer for Large-Scale Assessments. Http://ralsa.ineri.org/.
Mirazchiyski, P. & INERI. (2021). RALSA: R Analyzer for Large-Scale Assessments [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=RALSA (R package version 0.90.3).