Virtual useR! 2024

This presentation will introduce the newly designed 'Cloud Fundamentals for scientific workflows' lesson, created using The Carpentries Workbench template, including the R packages sandpaper, pegboard and varnish. This lesson explores the principles of cloud data management as applied in an innovative environment science data platform. This interactive lesson navigates participants through the platform's repertoire of functionalities, informing them on elements like data storage and logical representation, organisation of databases via data layers and containers, the management of diverse working environments, and the platform's unique operation as a 'data lake' for raw data storage. Furthermore, the session will unravel an end-to-end data management process on the cloud platform, spotlighting real-world usage through a practical example using a tide gauge dataset. Designed with a learner-centric approach, this lesson includes engaging challenges to enhance comprehension. It offers participants a solid grounding in cloud computing and its practical application in scientific research.

Speakers

Maria Rivera Araya

Senior Scientific Data Specialist, Department of Environment, Science and Innovation

Maria Rivera Araya, PhD., is a cloud data scientist and passionate educator with over ten years of experience in science projects in academia and government. Her multidisciplinary background spans the natural, social sciences, and technology fields. She is currently working on migrating... Read More →

Tuesday July 2, 2024 04:30 - 04:50 CEST
YouTube Premier

Virtual Session Presentation, Community

05:00 CEST

CANCELLED: A Promising Power Analysis Package for Structural Equation Models: Package SemPower - Teck Kiang Tan, National University of Singapore

Tuesday July 2, 2024 05:00 - 05:20 CEST

A Promising Power Analysis Package for Structural Equation Model Package semPower Camtasia pptx

Structural equation modeling (SEM) is often used to test theories, verify model measurement properties, and obtain unbiased estimates, a widespread modeling approach for composite hypotheses. However, power analysis is seldom considered in SEM studies to ensure the required sample size needed to achieve adequate power for detecting the hypothesized effect. This is due partly to the lack of a comprehensive SEM power analysis software package. The session introduces the intelligible semPower package that provides both global and local power analysis for establishing the power of a model and the specific hypothesis respectively. Three models will be illustrated. First, by varying the factor loadings of a confirmatory factor analysis model to determine power and sample size. Second, varying two loadings concurrently of a mediation model to determine the required sample size, and varying the covariance of a latent growth model to test for a local power analysis. The R syntax is illustrated to show the usefulness of using the package semPower, and graphing the model via the package semPlot. Users will find the syntax simple and can easily carry out power analysis for their studies.

Speakers

Teck Kiang Tan

A Promising Power Analysis Package for Structural Equation Models: Package semPower, National University of Singapore

Dr. Teck Kiang Tan is a senior research fellow at the National University of Singapore. His research interests that involved R packages include R graphics, doubly classified models, multilevel modeling, cognitive diagnostic models, sequence analysis, informative hypotheses, and longitudinal... Read More →

Tuesday July 2, 2024 05:00 - 05:20 CEST
YouTube Premier

05:30 CEST

Decomposition Based Deep Learning Model for Forecasting - Dr. Kapil Choudhary, Agriculture University Jodhpur

Tuesday July 2, 2024 05:30 - 05:50 CEST

Hybrid model is the most promising forecasting method by combining decomposition and deep learning techniques to improve the accuracy of time series forecasting. Each decomposition technique decomposes a time series into a set of intrinsic mode functions (IMFs), and the obtained IMFs are modelled and forecasted separately using the deep learning models. Finally, the forecasts of all IMFs are combined to provide an ensemble output for the time series. The prediction ability of the developed models are calculated in terms of evaluation criteria like root mean squared error,mean absolute percentage error and, mean absolute error.

Speakers

Dr. Kapil Choudhary

Dr. Kapil Choudhary, Agriculture University Jodhpur

Dr. Kapil Choudhary developed an early interest in forecasting and dedicated his academic pursuits to mastering the intricacies of agriculture statistics. He earned his Master's and Ph.D. from ICAR-IARI, New Delhi. His research interests are time series forecasting, machine learning... Read More →

Tuesday July 2, 2024 05:30 - 05:50 CEST
YouTube Premier

06:00 CEST

DeΒARMA: An R-Shiny Application for Modeling Antimicrobial Resistance Rate Data with Zeros or Ones - Jevitha Lobo, Novo Nordisk

Tuesday July 2, 2024 06:00 - 06:20 CEST

Antimicrobial resistance (AMR) has become a major public health challenge in the 21st century, posing a global health crisis that jeopardizes modern medicine. The traditional time-series analysis methods such as the Auto-regressive moving average model has been used to analyze and forecast AMR rates. However, these methods are unsuitable when analyzing rates or proportions that feature zero or one. This study proposes a new time-series model called DeβARMA (Degenerate Beta Auto-regressive moving average) that fits data in the interval [0, 1) or (0, 1]. This model is designed to predict the rate of AMR and plan accordingly. Healthcare providers need to be alerted in real-time to the AMR rate patterns in their respective settings so that they can better anticipate changes in resistance rates over time and develop more effective anti-microbial management policies. Shiny is an exciting R programming tool for creating various applications such as exploratory data analysis, statistical inference, and regression analysis. This article highlights DeβARMA, a specialized tool for modeling time-series data with zeroes or ones, adeptly handling lag effects and regressor variables.

Speakers

Jevitha Lobo

Ms., Novo Nordisk

Jevitha Lobo is a Senior Statistician at Novo Nordisk in Bengaluru, India, with over 3 years of teaching experience and over 4 years of research experience. Her areas of expertise include Statistical Inference, Advanced Regression, Time-Series Modeling, and Statistical Methods in... Read More →

Tuesday July 2, 2024 06:00 - 06:20 CEST
YouTube Premier

06:30 CEST

R4CR: R Education for Clinical Researchers via Quarto - JInhwan Kim, Zarathu Co., Ltd.

Tuesday July 2, 2024 06:30 - 06:50 CEST

Clinical research is one of the fastest growing fields in the world, and R is becoming increasingly important as a way to handle data, especially as more and more studies are conducted with small numbers of patients, or in collaboration with multiple institutions to collect data and conduct research. Rather than using R to analyze data, clinical researchers have typically focused on study design, data collection, and validation, while coding has been done by professional developers, but now more and more clinical researchers are trying to use R themselves, including data management. To this end, we have been providing R training for clinical researchers, but there is a lot of room for improvement compared to professional training services, such as reflecting the latest R-related technology trends and making the training experience better. In this session, I will share how we decided to use Quarto, what we considered in order to provide R training for clinical researchers, how we actually used Quarto, the advantages and disadvantages of using Quarto, our achievements, and our future plans.

Speakers

JInhwan Kim

R developer, Zarathu Co., Ltd.

Jinhwan is R / Shiny developer with background in bioinformatics. He has dedicated his career to crafting data products using R ecosystem across diverse industries as a Data Scientist. Currently, He is a key contributor at Zarathu, where he specializes in developing R packages and... Read More →

Tuesday July 2, 2024 06:30 - 06:50 CEST
YouTube Premier

Virtual Session Presentation, Community

07:00 CEST

Enhancing the R Dev Guide: A GSoD 2022 Journey and Ongoing Progress - Saranjeet Kaur Bhogal, RSE Asia Association & Lluís Revilla, IrsiCaixa

Tuesday July 2, 2024 07:00 - 07:20 CEST

The R Development Guide (R Dev Guide) serves as a comprehensive resource to facilitate the onboarding process for new contributors to the R project. Its initial draft emerged in 2021 which was made possible through a funding from the R Foundation. Subsequently, a significant update took place during Google Season of Docs (GSoD) 2022. This update involved the inclusion of new chapters and sections, addressing important aspects such as translations, adopting a git workflow for proposed patches, and providing novice-friendly instructions on installing R from source. Work has continued post-GSoD 2022, to implement further updates and improvements, many in response to feedback on earlier versions. This talk reviews the current status of the guide, shedding light on the positive impact it has had on the R community, particularly among new contributors. With the enhancements to the Guide, contributing has become more accessible, especially for newcomers, compared to relying solely on the official documentation. The talk aims to showcase the evolving nature of the guide and underscore its role in fostering a more inclusive and supportive environment for those engaging with the R project.

Speakers

Saranjeet Kaur Bhogal

Research Software Engineer, Imperial College London

Saranjeet has a Masters degree in statistics and is Software Sustainability Institute Fellow 2023. She has been involved with software engineering communities throughout her career including open source programs like Google Summer of Code 2020, Code for Science and Society’s Digital... Read More →

user virtual24 r dev guide pdf

Tuesday July 2, 2024 07:00 - 07:20 CEST
YouTube Premier

Virtual Session Presentation, Community

09:00 CEST

Making Better Error Messages with Rlang and Cli - Emil Hvitfeldt, Posit PBC

Tuesday July 2, 2024 09:00 - 09:20 CEST

An important part of writing software revolves around functionality and features. A sometimes overlooked part is what happens when something goes wrong. There are many reasons something can go wrong. Faulty input, user error or deprecation are a few examples. Regardless of what the reason is, we should thrive towards letting the user know as soon and informative as possible so they can get back on track with what they are doing. This talks showcases how to make better error messages with the packages rlang and cli.

Speakers

Emil Hvitfeldt

NA, Posit PBC

Emil Hvitfeldt is a software engineer at Posit and part of the tidymodels team’s effort to improve R’s modeling capabilities. He maintains several packages within the realms of modeling, text analysis, and color palettes. Trying to make slidecrafting a well respecting verb. He... Read More →

Tuesday July 2, 2024 09:00 - 09:20 CEST
YouTube Premier

Virtual Session Presentation, Programming

10:00 CEST

Connecting Shiny Apps to Chemotion-ELN - Konrad Krämer, KIT / The Compound Platform

Tuesday July 2, 2024 10:00 - 10:20 CEST

Chemotion-ELN is an open source electronic lab notebook (ELN) developed by the the ComPlat group at KIT in collaboration with the consortium NFDI4Chem within the National Research Data Infrastructure. This session delves into the seamless integration of Shiny apps with Chemotion-ELN, enabling users to effortlessly transmit data to these apps for analysis. The results are then seamlessly conveyed back to the electronic lab journal, enhancing its capabilities. Through this bidirectional communication, the ELN becomes highly extensible, empowering researchers with a versatile platform. In this presentation, we spotlight Biostats, an exemplary Shiny app. This application is tailored for fundamental statistical tasks, encompassing data wrangling, visualization, and the multifaceted nature of statistical tests. Witness the transformative potential of connecting Chemotion-ELN with Shiny apps, exemplified through the intuitive functionality of the Biostats app.

Speakers

Konrad Krämer

Dr, KIT / The Compound Platform

I'm Konrad Krämer, a Ph.D. where my research focused on developing numerical software using R and C++. Currently, I work as a postdoc specializing in lab automation and web development. I'm excited to contribute to the evolving landscape of biology, computation, and automation.

Connecting shiny apps to Chemotion ELN pptx

ELN cut webm

OverviewBiostats cut webm

Visualisation cut webm

ResultsOfBiostatsInELN cut webm

Tuesday July 2, 2024 10:00 - 10:20 CEST
YouTube Premier

Virtual Session Presentation, Reporting

10:00 CEST

Automating Updates to Shiny Dashboards Deployed on Shinyserver - Clinton David, Oxford Policy Management [Pre-Registration Required]

Tuesday July 2, 2024 10:00 - 11:00 CEST

Zoom

I presume that, as R developers, we’ve heard of or perhaps used shiny package to develop web applications, and that, we can attest to the fact that it has emerged as a popular framework for developing dashboards in R. These shiny dashboards provide a dynamic platform for data exploration, analysis, and dissemination, making them invaluable tools for researchers, analysts, and decision-makers. However, as the volume and complexity of data grow, maintaining and updating these dashboards manually can become a time-consuming and error-prone process. Automating updates to shiny dashboards deployed on shiny server offers a solution to this challenge, enabling developers to streamline the deployment pipeline, enhance efficiency, and ensure the timely dissemination of accurate information. This session will be a hands-on tutorial that will take you through the process of setting up the workflow that works in such a way that, a push event to the main branch of a GitHub repository triggers a webhook that in turn sends a payload to an API which also triggers a bash script to do a git pull from the same repository.

Tutorial Materials: https://github.com/oyogo/useR2024_tutorial_materials

Speakers

Clinton David

Data Scientist, Oxford Policy Management

Clinton Oyogo David is a data scientist with 7 years of experience currently working at Oxford Policy Management (OPM) under the data innovations team. My day to day tasks are mostly developing data pipelines, data dashboards, machine learning, spatial analytics and data wrangling... Read More →

Tuesday July 2, 2024 10:00 - 11:00 CEST
Zoom

Virtual Tutorial, Programming

10:30 CEST

One Container to Rule Them All - Magnus Mengelbier, Limelogic AB

Tuesday July 2, 2024 10:30 - 10:50 CEST

The use of containers across organizations and their GxP R environments is quite common. The rationale could differ from providing isolated project specific R compute environments based on a standard validated container or it is simply a cheap and easily accessible virtual environment in a pre-existing IT architecture. In most circumstances, this approach relies on maintaining different frozen validated container images, sometimes referred to as golden images, for each particular use case. If instead we utilize container image provenance and some very simple controls in our validation strategy, we could create one validated container image that is then used as the basis for any container images designed for particular use cases. We consider how the approach impacts and, in many ways, simplifies the four classic use cases of Posit Workbench for interactive development, {shiny} for apps, {plumber} for APIs, and a back-end compute environment for batch processing.

Speakers

Magnus Mengelbier

Managing Director, Limelogic AB

Magnus is currently the Managing Director of Limelogic, a contributor, collaborator and independent consultant based in southern Sweden with over 25 years of experience in the Life Science industry. A keen advocate of simple programming approaches with a focus on GxP, compliance... Read More →

Tuesday July 2, 2024 10:30 - 10:50 CEST
YouTube Premier

Virtual Session Presentation, Programming

11:00 CEST

CANCELLED: Regression Models for [0, 1] Responses Using Betareg and Crch - Achim Zeileis, Universität Innsbruck

Tuesday July 2, 2024 11:00 - 11:20 CEST

In this presentation we show how to model data from the closed unit interval [0, 1] using extended-support beta regression and heteroscedastic two-limit tobit models. In contrast to zero- and/or one-inflated beta regression, both approaches only require estimation of a single latent process that captures both the distribution of the inner observations and the point masses for observations on the boundaries at 0 and/or 1. The heteroscedastic two-limit tobit model does so by fitting a Gaussian distribution censored at 0 and 1 which is conveniently available in the R package "crch". Extended-support beta regression has recently been proposed and implemented in the development version of the "betareg" package. It contains both classic beta regression and heteroscedastic two-limit tobit as special cases, shifting between the two with just one additional parameter. Both approaches are illustrated by modeling reading accuracy scores of children and investments in an economic loss aversion experiment, respectively, discussing the models' relative (dis)advantages.

Speakers

Achim Zeileis

Professor of Statistics, Universität Innsbruck

Achim Zeileis is Professor of Statistics at the Faculty of Economics and Statistics at Universität Innsbruck. Being an R user since version 0.64.0, Achim is co-author of a variety of CRAN packages such as zoo, colorspace, party(kit), sandwich, or exams. In the R community he is active... Read More →

Tuesday July 2, 2024 11:00 - 11:20 CEST
YouTube Premier

11:30 CEST

Openstatsguide - Minimum Viable Good Practices for High Quality Statistical Software Packages - Daniel Sabanés Bové, RCONIS

Tuesday July 2, 2024 11:30 - 11:50 CEST

The success of the R programming language is largely due to its ease of creating and sharing R packages. We propose an opinionated framework called “openstatsguide”, published on openstatsware.org/guide.html, which can guide R package developers towards a minimum set of good practices. As far as we know from our literature search, this is the first attempt at providing a small and concise set of rules for package developers. This applies not just to R, but can also be used for functionally oriented programming languages used in data science, and we give examples for R, Python, and Julia. Rather than a full and detailed how-to guide, we keep “openstatsguide” short and on a high level, thus lowering the entry point for novice and seasoned developers alike. Our hope is that this guide can increase the adoption of software engineering good practices in the statistics community. In this talk we describe the motivation and scope of “openstatsguide”, relationship with existing work, the set of good practices, the maintenance model and ideas for future complementary guides produced by the openstatsware.org working group.

Speakers

Daniel Sabanés Bové

Ph.D., RCONIS

Daniel Sabanés Bové studied statistics and obtained his PhD in 2013. He started his career with 5 years in Roche as a biostatistician, then worked 2 years at Google as a Data Scientist, before rejoining Roche in 2020, where he founded and led the Statistical Engineering team. Daniel... Read More →

Tuesday July 2, 2024 11:30 - 11:50 CEST
YouTube Premier

Virtual Session Presentation, Reporting

11:30 CEST

Missing Data Exploration, Imputation, and Evaluation - Hanne Oberman, Utrecht University [Pre-Registration Required]

Tuesday July 2, 2024 11:30 - 12:30 CEST

Zoom

Missing data are ubiquitous, pervasive, and often ignored in statistical analyses. Unfortunately, default methods such as complete case analysis may lead to biased and invalid results. This hands-on tutorial aims to equip data analysts with knowledge and skills to validly handle missing data using the popular R package {mice}. {mice} implements multiple imputation by chained equations, a flexible method for imputing (i.e. filling in) missing entries. The session will combine theoretical insights with hands-on exercises. Attendees will first learn the fundamentals of missing data theory, and then gain practical experience in addressing real-world missing data problems through guided demonstrations and exercises. Attendees are encouraged to bring their own incomplete datasets, to implement and evaluate their newfound skills. By the end of the session, attendees will be able to make informed decisions on how to validly handle missing data in their own data analysis projects.

Speakers

Hanne Oberman

MSc, Utrecht University

Statistician interested in data visualization, interdisciplinarity, and open science. Hanne is a PhD candidate in Methodology and Statistics at Utrecht University, working on computational evaluation and data visualization in the Missing Data research group. Core developer for the... Read More →

Tuesday July 2, 2024 11:30 - 12:30 CEST
Zoom

Virtual Tutorial, Statistical Methods

12:00 CEST

Back to the Drawing Board - How to Quickly Mock-up and Test Your Application User Interface Designs - Barbara Mikulasova, Katalyze Data

Tuesday July 2, 2024 12:00 - 12:20 CEST

The buzz around the {shiny} package has only increased over the past year. Now, more and more R programmers use interactive dashboards to share their data insights with a diverse stakeholder audience – from their colleagues to project managers and sponsors. But even for seasoned R programmers, any dashboard or application development is a resource intensive process and, if not planned carefully, can cause the team delays on the project deliveries, or worse, suboptimal user acceptance rates. Quick and targeted prototyping can prevent many of these situations from happening. This talk is aimed at fellow data scientists who want to start implementing {shiny} applications in their workflow but have little to no background in web application development. In this presentation, the author shares lessons learned from their own mistakes during the development of various production ready {shiny} applications to help novice developers avoid similar pitfalls. The attendees will see strategies for creating and testing a quick user-interface mock-up design using {fakir} and {shinipsum} packages. Additionally, they will learn how to translate the prototype into a minimum viable product design.

Speakers

Barbara Mikulasova

Trainer Consultant, Katalyze Data

Barbara is a statistical programmer and an aspiring R shiny developer. She is passionate about creating interactive tools in R to help stakeholders better understand their data. She has been working at Katalyze Data for two years during which she developed and delivered a range of... Read More →

Tuesday July 2, 2024 12:00 - 12:20 CEST
YouTube Premier

Virtual Session Presentation, Reporting

12:30 CEST

Demystifying the HP Filter with an Easy-to-Use R Package - Alexandru Monahov, Bank of England

Tuesday July 2, 2024 12:30 - 12:50 CEST

This session introduces participants to the Hodrick-Prescott filter, a data series smoothing technique frequently used in economics and finance, briefly explains the underlying mathematics and presents the easy-to-use hpfilter R package, which calculates both the one- and two-sided implementations of the filter. The HP filter is a mathematical tool used to smooth out short-term fluctuations in data and reveal underlying long-term trends. Popularized in the 1990s by economists Robert Hodrick and Edward Prescott, it has become a staple in fields like macroeconomics, real business cycle theory and finance. Participants will get a hands-on tour of the package and learn to apply the HP filter to a concrete use case in finance. They will learn how to compile the trend and cycle components from a financial time series, plot the results and succinctly interpret the findings.

Speakers

Alexandru Monahov

Dr., Bank of England

Alexandru Monahov is a Research Economist in the Bank of England’s Financial Stability Directorate, Stress Testing Strategy Division. His expertise covers research and policy work on systemic risk, prudential regulation, stress-testing and macro-financial linkages by means of econometric... Read More →

Tuesday July 2, 2024 12:30 - 12:50 CEST
YouTube Premier

13:00 CEST

CANCELLED: Unlocking the Business Value of R Programming - Elisha Chitsenga, Crestly Resorts

Tuesday July 2, 2024 13:00 - 13:20 CEST

The difficulty of open source technology is to make the best use of their limited resources (people and money) to achieve their goals and objectives. When an entity invests resources in a project that is unlikely to benefit the organization, it misses out on possibilities. R programmers and users should be aware of an organization's allocation and investment policies to determine whether the business is well-positioned to maximize its resource allocation. Traditionally, talks about the return on investment (ROI) of IT investments between senior management and IT experts have focused on financial benefits, such as the effects on the organization's budget and finances. To better appreciate and assess nonfinancial benefits, it is advised that they be rendered visible, which includes the implications on mission performance and operational outcomes. To better understand and assess nonfinancial benefits, it is suggested that they be made visible and verifiable utilizing algorithms that convert them into monetary units. In this session, we will find the most significant intellectual and affective benefits that R programming has and yet to be provided to the community.

Speakers

Elisha Chitsenga

Software Developer, Community, and Business Leader, Crestly Resorts

Elisha is a software developer with over 10 years of expertise, an accounting degree, a leader in the community and company, and a fully accredited information systems auditor. His main goals are to use cloud-based solutions, Linux, RISC-V, Python, and R coding to help the African... Read More →

Tuesday July 2, 2024 13:00 - 13:20 CEST
YouTube Premier

Virtual Session Presentation, Domain Specific Applications

13:30 CEST

Dandelion Hub - a Central Repository for De-Central Actions for Ecosocial Justice Based on R/Shiny - Wilmar Igl, Private

Tuesday July 2, 2024 13:30 - 13:50 CEST

Social and ecological systems around the world are experiencing multiple crises [1]. Decision makers have shown a lack of ability or will to take action to reduce current social and ecological injustices [2]. However, individuals without formal political or economic power can contribute to peacefully guiding the world back to a sustainable pathway. The Dandelion Hub (https://dhub.global) serves as a central repository for de-central, non-violent actions for ecosocial justice. The Dandelion Hub (DHub) uses a web frontend (R/Shiny) and database backend (MariaDB) to record and report actions across a broad spectrum of non-violent actions according to the classification by Sharp (1973) [3]. As of 2024-01-28, 2,389,238 activists across 615 actions in 180 cities, 50 countries, and 6 continents, who took part in non-violent actions between 2018-08-20 to 2024-01-14, are represented in the repository. Open-source technology, such as R/Shiny, can contribute to the socio-ecological transformation of society. References: [1] WEF (2024). http://tinyurl.com/2995u9jh [2] IPCC (2023). http://tinyurl.com/5cybutfs [3] Sharp, Gene (1973). http://tinyurl.com/2jvhvxwt

Speakers

Wilmar Igl

PhD, Private

Wilmar Igl, PhD, has a background in medical statistics and psychology. He has over 20 years of professional experience in the life sciences and has also been active in the climate movement since 2018. His experiences have resulted in broader interests in eco-social justice and projects... Read More →

Tuesday July 2, 2024 13:30 - 13:50 CEST
YouTube Premier

Virtual Session Presentation, Reporting

14:00 CEST

Rix: Reproducible Environments with Nix - Bruno Rodrigues, Ministry of Research and Higher Education, Luxembourg

Tuesday July 2, 2024 14:00 - 14:20 CEST

I will be talking about {rix}, a new package still in development that leverages the powerful Nix package manager. With Nix, it is possible to create project-specific environments that contain a project-specific version of R and R packages (as well as other tools or languages, if needed). You can use rix and Nix to replace renv and Docker with one single tool. rix provides functions to help you write and deploy Nix expressions (written in the Nix language). These expressions will be the inputs for the Nix package manager, to build sets of software packages and provide them in a reproducible development environment. These environments can be used for interactive data analysis, or reproduced when running pipelines in CI/CD systems. Environments contain R and all the required packages that you need for your project. The Nix R ecosystem currently includes almost the entirety of CRAN and Bioconductor packages. Like with any other programming language and software, it is also possible to install older releases of R packages, or install packages from GitHub at defined states.

Speakers

Bruno Rodrigues

Head of stats department, Ministry of Research and Higher Education, Luxembourg

Bruno is currently employed as the head of the statistics department at the Ministry of Research Higher education in Luxembourg. Before joining the public sector, Bruno worked as a data science consultant in one of the big four accounting companies, and before that as a teaching and... Read More →

Tuesday July 2, 2024 14:00 - 14:20 CEST
YouTube Premier

Virtual Session Presentation, Programming

14:30 CEST

MINT+: Web App with R Brains for SDTM Automation - Magdalena Krochmal & Adam Forys, Roche

Tuesday July 2, 2024 14:30 - 14:50 CEST

In the realm of clinical research, a web application known as MINT+ is revolutionizing the process of SDTM automation. At its core, MINT+ utilizes a set of R-packages to power the entire solution. Its intuitive React UI empowers users to create custom SDTM mapping specifications, accommodating diverse study requirements. Leveraging DocumentDB for data storage, MINT+ enables easy metadata sharing and facilitates reuse across studies, significantly reducing workload and improving accuracy. During this session, we will explore the R-based components that power MINT+ and are responsible for data processing and backend processes. The "rmint.sdtm" automates SDTM mappings, "rsaffron.api" serves as the backend API, and "roak" allows customization of mappings. Users can address complex scenarios that often arise in the SDTM mapping creation process, making R packages the preferred choice for overcoming industry challenges. With advanced algorithms, a user-friendly interface, and seamless integration, MINT+ streamlines SDTM creation workflow, greatly reducing the time and effort required.

Speakers

Adam Forys

Mr., Roche

Adam is a Principal Data Scientist at Roche. He is dedicated to building R packages that empower teams working on SDTM. He is committed to collaboration and enjoys guiding others in overcoming technical obstacles and optimizing their data science workflows.

Magdalena Krochmal

Senior Data Scientist, Roche

Magdalena Krochmal is a Senior Data Scientist based in Basel, Switzerland. With a background in biomedical engineering and a Ph.D. in bioinformatics, she has spent three impactful years at Roche. Magdalena is an expert R developer specializing in SDTM automation. Her work centers... Read More →

Tuesday July 2, 2024 14:30 - 14:50 CEST
YouTube Premier

Virtual Session Presentation, Domain Specific Applications

14:30 CEST

Flexible Additive Models for Survival and Event-History Analysis - Andreas Bender & Johannes Piller, LMU [Pre-Registration Required]

Tuesday July 2, 2024 14:30 - 15:30 CEST

Zoom

The Piecewise Exponential Additive Mixed Model (PAMM) has gained popularity in various domains due to its ability to tackle a wide variety of survival tasks and its flexibility to model multivariate non-linear covariate effects, including time-varying effects and cumulative effects. One advantage of this model class is the ability to use different backends for estimation. However, in order to be useful in practice, their use requires pre-processing, which differs depending on the survival task at hand and post-processing (e.g. transforming estimated parameters to quantities like survival or transition probabilities). The R package pammtools (https://adibender.github.io/pammtools/) facilitates the entire modeling process. In this tutorial, we illustrate how to apply the model class in different settings, including left-truncation, recurrent events and multi-state models.

Speakers

Johannes Piller

Doctoral Candidate, LMU

Johannes Piller is a doctoral candidate at the department of statistics at LMU Munich, specializing in statistical modeling. Prior, he completed his master’s degree in Mathematics at TU Munich.

Andreas Bender

LMU

Andreas Bender is a postdoctoral lecturer and researcher at the department of statistics of LMU Munich, with interest in (machine learning) survival analysis.

Tuesday July 2, 2024 14:30 - 15:30 CEST
Zoom

Virtual Tutorial, Domain Specific Applications

15:00 CEST

Unlock Your Data Insights Faster: The 'CohortBuilder' Way. - Adam Forys, Roche & Krystian Igras, 7N

Tuesday July 2, 2024 15:00 - 15:20 CEST

Cohort analysis is vital for understanding patterns and trends within datasets, particularly in fields like healthcare, marketing, and user analytics. The 'cohortBuilder' and 'shinyCohortBuilder' packages in R offer a convenient approach to defining and manipulating cohorts. If you're exploring ways to streamline your cohort analysis workflow within Shiny, this talk will introduce you to powerful tools worth exploring. During the presentation, I will demonstrate the core concepts of the 'cohortBuilder' ecosystem, highlighting their strengths in performing cohort analysis and visualization within R Shiny.

Speakers

Adam Forys

Mr., Roche

Krystian Igras

MSc, 7N

Krystian is a Scientific Software Engineer at 7N. For many years involved in business consulting and analytical projects as well as conducting workshops in the R language. Currently focused on implementing complex Shiny components operating on Real World Medical Data. He is interested... Read More →

Tuesday July 2, 2024 15:00 - 15:20 CEST
YouTube Premier

Virtual Session Presentation, Reporting

16:00 CEST

Leveraging R-Ladies Paris Reach for Community Impact - Chaima Boughanmi, BVA Xsight

Tuesday July 2, 2024 16:00 - 16:20 CEST

This presentation aims to introduce the R-Ladies Paris community, a local chapter of the global organization R-Ladies Global, which strives to reduce gender inequalities and enhance the visibility, participation, and recognition of contributions from underrepresented genders within the R community. We will discuss our strategies to bring R enthusiasts together, fostering a collaborative environment where individuals can learn from each other. We will share insights on how to maintain an active presence to encourage member engagement. Moreover, we'll provide tips and pieces of advice on how we make our meetups accessible to connect with a broader audience. We will address the resources provided by R-Ladies Global to support the activities of chapters, which could serve as encouragement for you to embark on your R-Ladies journey in the future.

Slides: https://rladiesparis.github.io/rladies_paris_talk_useR2024/

Speakers

Chaima Boughanmi

Data scientist & Business Modeller, BVA Xsight

Chaïma, a junior Data Scientist at BVA in Paris, is an engineer in statistics and information analysis with a master’s degree in Data Science from Université Paris Saclay. Experienced in modelling, data analysis,coding and ML,she has worked across various sectors. With a passion... Read More →

Tuesday July 2, 2024 16:00 - 16:20 CEST
YouTube Premier

Virtual Session Presentation, Community

16:30 CEST

Stop Making Spaghetti (Code) - Nicola Rennie, Lancaster University

Tuesday July 2, 2024 16:30 - 16:50 CEST

With an increasing number of academic journals requiring authors to submit code, an increasing number of PhD students developing R packages, and more open source packages requiring maintenance, the list of R programming skills required of new quantitative PhD students is ever growing. Many of these PhD students don’t have backgrounds in computer science, but find themselves writing code and developing software on a daily basis. They don’t always have supervisors with backgrounds in computer science either. So how do we help students go from writing spaghetti code, to working with good software development practices? In this talk, I’ll outline what training is currently offered to PhD students, gaps that have been identified (often by students themselves), and a suggestion of how we can better prepare PhD students for quantitative research so that none of them say “If I knew then what I know now, I would have done things entirely differently.”

Speakers

Nicola Rennie

Lecturer in Health Data Science, Lancaster University

Nicola Rennie is a Lecturer in Health Data Science based within the Centre for Health Informatics, Computing, and Statistics at Lancaster Medical School. Her research interests include applications of statistics and machine learning to healthcare and medicine, communicating data through... Read More →

Tuesday July 2, 2024 16:30 - 16:50 CEST
YouTube Premier

Virtual Session Presentation, Community

17:00 CEST

DATA PIPELINE to ANALYZE FODESAF´S CASH FLOW: KEY OUTPUTS in R QUARTO - Roberto Delgado Castro, DIRECCION GENERAL DE DESARROLLO SOCIAL Y ASIGNACIONES FAMILIARES

Tuesday July 2, 2024 17:00 - 17:20 CEST

FODESAF is the main financial instrument of selective social policy in Costa Rica, in the fight against poverty. From 2005 up to 2022, according with the evolution of FODESAF´s cash flow, its income was around US$ 13 thousand million, resources that have been transferred, nearly in a 100% basis, to social programs nationwide. Only in the year 2022, the total amount of money transferred to the market was around US$ 1.000 million. Such massive amount of resources of FODESAF´s cash flow has been analyzed due to the design of a Data Pipeline in R. Its inputs stage (ingest) are all reports and data coming from technical-internal offices and external institutions. The processing stage is constituded by data mdeling and tidy packages in R. The final outputs (results) are HTML reports in Quarto© for managerial levels and government officials, as well as Power Bi© reports inserted in FODESAF´s webpage for general consultation of citizenship as a whole, in order to achieve transparency mandates. Due to the usage of R, large amounts of data are processed to generate high-quality final products, in order to support the design of better public-social policies.

Speakers

Roberto Delgado Castro

Mr, DIRECCION GENERAL DE DESARROLLO SOCIAL Y ASIGNACIONES FAMILIARES

Roberto Delgado Castro is a data scientist, researcher and compliance officer of Direccion General de Desarrollo Social y Asignaciones Familiares (DESAF), part of the Ministry of Labor and Social Security. He has degrees in finance and banking, marketing and sales and management... Read More →

Tuesday July 2, 2024 17:00 - 17:20 CEST
YouTube Premier

Virtual Session Presentation, Reporting

17:00 CEST

Contributing Translations to R - Gergely Daroczi, Rx Studio Inc. [Pre-Registration Required]

Tuesday July 2, 2024 17:00 - 18:00 CEST

Zoom

The R Project has a global and active community with members speaking different languages around the world, often with the need or preference to be able to use R a language instead of English. To support this, R Core has implemented GNU gettext helpers enabling the translation of messages, warnings, errors etc since R version 2.1.0 (April 2005). This tutorial will provide a short overview of the related history; discuss how translations are managed in base R; review the standard PO file format used by gettext; describe the traditional process for contributing patches to R Core, and then introduce Weblate, a web-based PO file editor that simplifies the translation process. By the end of the tutorial, you should be able to translate messages from base R into your natural language of choice.

Speakers

Gergely Daroczi

CTO, Rx Studio Inc.

Gergely Daróczi, PhD, has been an enthusiast R user & package developer for 20 years; former assistant professor; founder of an R-based reporting webapp at rapporter.net; ex Lead R Dev, then Dir. of Analytics at CARD.com; later Sr. Dir. of Data Operations at System1; currently balancing... Read More →

Tuesday July 2, 2024 17:00 - 18:00 CEST
Zoom

Virtual Tutorial, Community

17:30 CEST

Learning Together at the Data Science Learning Community - Jon Harmon, Data Science Learning Community

Tuesday July 2, 2024 17:30 - 17:50 CEST

Do you have a bookshelf full of R books that you’ve been meaning to read, but haven’t gotten around to yet? The Data Science Learning Community (DSLC) can help you achieve your learning goals! You may have heard of us previously as the R4DS Online Learning Community, but we’re about much more than any single book. We organize weekly book clubs to help data science learners and practitioners read books such as R for Data Science, Advanced R, and Mastering Shiny. In contrast with other online book clubs, our safe, nurturing, small-group cohorts finish reading their books cover-to-cover. Come learn the tips and tricks that lead to the success of our clubs. Also learn how we support one another by asking and answering programming questions in our friendly and inclusive Slack help channels, and how we work to ensure that every question receives an answer. Discover how you can help us expand into additional high-level topics by getting the support you need to learn those topics yourself. Whether you’re brand new to R or are a seasoned veteran, we have something for you!

Speakers

Jon Harmon

Executive Director, Data Science Learning Community

Jon is the Executive Director of the Data Science Learning Community, a diverse, friendly, and inclusive community of data science learners and practitioners. He is also an advanced R programming consultant, specializing in interactions between R and the Internet. He is originally... Read More →

dslc useR2024 pdf

Tuesday July 2, 2024 17:30 - 17:50 CEST
YouTube Premier

Virtual Session Presentation, Community

18:00 CEST

Detecting Abnormal Fish Behaviors with Machine Learning - Enrique Garcia-Ceja, Tecnologico de Monterrey

Tuesday July 2, 2024 18:00 - 18:20 CEST

In marine biology, fish behavior understanding is essential to detect environmental changes induced by climate change or pollution. Fish behaviors can be characterized by their swimming trajectories. In this session I will present how to use a set of R libraries to analyze fish trajectories captured from underwater video. I’ll guide you through every processing step, beginning with data reading, trajectory visualization, feature extraction, model training, and evaluation. I will present how to train an Isolation Forest model for anomaly detection and how to visualize the results.

Speakers

Enrique Garcia-Ceja

Professor, Tecnologico de Monterrey

Enrique is a professor at Tecnologico de Monterrey University, Mexico. Previously, he worked as a data scientist and as a Researcher at SINTEF, Norway. He did a postdoc at the University of Oslo and received his PhD degree in intelligent systems from Tecnologico de Monterrey University... Read More →

Tuesday July 2, 2024 18:00 - 18:20 CEST
YouTube Premier

Virtual Session Presentation, Domain Specific Applications

18:30 CEST

Transforming Data Into Information: Overcoming Challenges in Educational Data Analysis - Natalia da Silva, Universidad de la República, UDELAR

Tuesday July 2, 2024 18:30 - 18:50 CEST

The use of different Learning Management Systems (LMS) for various objectives has become a key tool in education. A huge volume of student and teacher data is generated by LMS on a daily basis. Transforming this data into relevant information for decision-making is a major challenge due to the complexity of the data structure and the difficulty of summarizing the learning process with registered information. This talk focuses on statistical tools for the evaluation and monitoring of LMS use by students and teachers. First, a web application was developed as a tool that allows monitoring the use of educational platforms in a user-friendly manner. Additionally, statistical learning methods were used to predict students' performance in tests using LMS information as predictors. Challenges such as data structure and size present many hurdles in this project. Most of these challenges are addressed using efficient computational tools at each stage of data analysis. Postgres serves as the SQL engine, data.table is used for data wrangling, and shiny, plotly, and ggplot2 are employed for communication and visualization. Finally, tidymodels and dbart are utilized for predictive models.

Speakers

Natalia da Silva

Assistant Professor, Universidad de la República, UDELAR

I am an Assistant Professor in the Department of Statistics at the Universidad de la República. I earned my Ph.D. degree in Statistics from Iowa State University in July 2017, under the supervision of Di Cook and Heike Hofmann. My research interests include supervised learning methods... Read More →

Tuesday July 2, 2024 18:30 - 18:50 CEST
YouTube Premier

Virtual Session Presentation, Domain Specific Applications

19:00 CEST

Community Detection for Extremely Large Networks - Aidan Lakshman, University of Pittsburgh

Tuesday July 2, 2024 19:00 - 19:20 CEST

Community detection in graphs has numerous applications from social networks to biology. However, the immense size of modern graphs makes it challenging to accurately detect communities. We set out to benchmark a variety of popular methods available in R to measure their accuracy and time complexity on synthetic and real datasets. Unsurprisingly, we found that less scalable algorithms tend to outperform more computationally efficient ones. To address this issue, we introduce two new variants of the Fast Label Propagation algorithm for clustering extremely large networks, both available in the SynExtend package for R. Our implementations offer accuracy comparable to less scalable approaches while providing linear-time computational scalability. Furthermore, we made it possible to apply our community detection algorithms outside of main memory, which permits community detection on graphs with billions of nodes using less than a gigabyte of RAM. These advances will help democratize scalable analyses by removing the need for expensive supercomputer resources. Together, this work both improves graph community detection and makes these analyses more accessible to researchers.

Speakers

Aidan Lakshman

PhD Candidate, University of Pittsburgh

Aidan Lakshman is a PhD Candidate in Biomedical Informatics at the University of Pittsburgh. His dissertation focuses on developing tools for large-scale comparative genomics. He is expected to graduate in May 2025 and is actively searching for employment opportunities. Aidan is an... Read More →

Tuesday July 2, 2024 19:00 - 19:20 CEST
YouTube Premier

19:00 CEST

Deploy and Monitor ML Pipelines with Open Source and Free Applications - Rami Krispin, Independent [Pre-Registration Required]

Tuesday July 2, 2024 19:00 - 20:00 CEST

Zoom

The workshop will focus on different deployment designs of machine learning pipelines using R, open-source applications, and free-tier tools. We will use the US hourly demand for electricity data from the EIA API to demonstrate the deployment of a pipeline with GitHub Actions and Docker that fully automates the data refresh process and generates a forecast on a regular basis. This includes the use of open-source tools such as point-blank to monitor the health of the data and the model's success. Last but not least, we will use Quarto doc to set up the monitoring dashboard and deploy it on GitHub Pages.

Speakers

Rami Krispin

Senior Manager Data Science and Engineering, Independent

Rami Krispin is a data science and engineering manager who mainly focuses on time series analysis, forecasting, and MLOps applications. He is the author of Hands-On Time Series Analysis with R and is currently working on my next book, Applied Time Series Analysis and Forecasting... Read More →

Tuesday July 2, 2024 19:00 - 20:00 CEST
Zoom

Virtual Tutorial, Programming

19:30 CEST

Performance Testing and Comparative Benchmarking for Data.Table - Doris Afriyie Amoakohene, Northern Arizona Univeristy

Tuesday July 2, 2024 19:30 - 19:50 CEST

The data.table package in R is a powerful tool for data analysis, combining efficient C code with user-friendly R syntax. To ensure its long-term sustainability, the NSF POSE program has funded a project from 2023 to 2025 to build a self-sustaining ecosystem around data.table. In this presentation, we will discuss the importance of performance testing in the development of data.table and present a general approach that can be applied to other R packages. By creating performance tests based on historical regressions, we can measure the package's efficiency over time and memory usage, ensuring that code and version releases do not impact its performance. We will demonstrate the use of the atime package to benchmark execution time and memory usage, providing developers with confidence in maintaining efficient performance and reliability. This approach not only benefits data.table but also serves as a model for other R package developers to enhance the performance and popularity of their own projects.

Speakers

Doris Afriyie Amoakohene

Performance Testing and Comparative Benchmarking for data.table, Northern Arizona Univeristy

Doris holds a degree in BSc. Statistics and is currently pursuing a master's degree in Informatics at the Northern Arizona University. She is the Founder and CEO of LAG Prestige Foundation. Additionally, Doris is a Research Assistant in a Machine learning lab and actively involved... Read More →

Tuesday July 2, 2024 19:30 - 19:50 CEST
YouTube Premier

Virtual Session Presentation, Community

20:00 CEST

Health Economic Assessment Tool (HEAT) for Walking and Cycling (HeatR) - Thomas Gotschi, UC Berkeley & Tomasz Szwski, HEAT project

Tuesday July 2, 2024 20:00 - 20:20 CEST

The WHO's Health Economic Assessment Tool (HEAT) for walking and cycling is a publicly available health impact calculator employing R as its primary language. Originally an Excel sheet, it evolved into an HTML web tool before transitioning to R/Shiny six years ago for improved transparency and collaboration. A prototype of the application is accessible at https://heatwalkingcycling.org/HEAT_langs_dev/tool, soon to be integrated into the main production. Our development insights include tailored packages for translation and custom features. HEAT consists of three components: input data, health impact and carbon modeling (via `heatr`), and the UI, each utilizing distinct R libraries. Data integrity is upheld through workflows with `targets` and `assertr` packages, addressing issues like choosing between ISO 3166-1 alpha-3 and alpha-2. The UI, focusing on user-friendly data input options, leverages R Shiny, and inserts default inputs dynamically from spreadsheets. The custom-built `translator` package supports translation independently from the Shiny ecosystem. Our `templater` package streamlines text and reduces editorial errors, enabling conditioned templates with locale support.

Speakers

Thomas Gotschi

Visiting Research Associate, UC Berkeley

Dr. Thomas Götschi is an internationally recognized expert in sustainable transportation research with a focus on active transportation and related health aspects. He has led projects including developments of data collection apps and tools, innovative travel survey designs, analysis... Read More →

Tomasz Szwski

Programmer, HEAT project

Tomasz is a full-stack developer. With more than 12 years of experience as a professional programmer, he has participated in scientific projects in the field of the environment/health modeling focusing on the active modes of transport mainly. He uses R, Shiny and JavaScript on a daily... Read More →

Tuesday July 2, 2024 20:00 - 20:20 CEST
YouTube Premier

Virtual Session Presentation, Programming

20:30 CEST

R Consortium's R-based Test Submission Package for FDA Evaluation - Joel Laxamana, Genentech

Tuesday July 2, 2024 20:30 - 20:50 CEST

In recent years, statisticians and analysts from pharma industry and regulatory agencies have increased adoption of open-source software such as R. R brings great benefits by providing a wealth of cutting-edge statistical tools, extension packages for interactive dashboard and documentation as well as adaptability to the latest data science trends. However, publicly available drug submissions with the R language as the core analysis language has been lacking and limits wider adoption within the Pharma industry. The R Consortium R Submission Work Group seeks to test the concept that a R-based language submission language can be bundled into a submission package and transferred successfully to FDA reviewers. As of May 2024, the R consortium R submissions working group has successfully completed three pilot submissions and received FDA CDER response letters. To our knowledge, these are the first publicly available submission packages that include components of open-source languages. In this talk, I will introduce the R consortium R submission Working Group and the completed Pilot 1, 2 and 3 findings, issues that we encountered, learnings as well as current work being done in Pilot 4.

Speakers

Joel Laxamana

Data Science Project Lead, Genentech

Joel Laxamana is a Data Science Project Lead in the Product Development Data Sciences group at Genentech, A member of Roche. Joel joined in December of 2013 as a Analytical Data Scientist with a focus on Statistical Programming. He started in Oncology, working on both early and late... Read More →

Tuesday July 2, 2024 20:30 - 20:50 CEST
YouTube Premier

Virtual Session Presentation, Community

20:30 CEST

Causal Inference in R: The Whole Game - Malcolm Barrett, Stanford University [Pre-Registration Required]

Tuesday July 2, 2024 20:30 - 21:30 CEST

Zoom

In this tutorial, I’ll present an overview of our book, Causal Inference in R, freely available at r-causal.org. We’ll discuss the whole game, so to speak, of causal inference, following a few key steps: 1. Specify a causal question 2. Draw our assumptions using a causal diagram 3. Model our assumptions 4. Diagnose our models 5. Estimate the causal effect, and 6. Conduct sensitivity analysis on the effect estimate. We’ll discuss some new tools in the causal inference ecosystem, such as tipr, ggdag, propensity, halfmoon, and more, each making the act of causal inference easier and more principled.

Speakers

Malcolm Barrett

Research Software Engineer, Stanford University

Malcolm Barrett is an epidemiologist and research software engineer at Stanford University. After receiving his Ph.D. in epidemiology from the University of Southern California, he worked as a data scientist at Apple and Posit. His work has focused on causal inference methodology... Read More →

Tuesday July 2, 2024 20:30 - 21:30 CEST
Zoom

Virtual Tutorial, Statistical Methods

21:00 CEST

CANCELLED: Redefining Interactive Data with Quarto and WebR - James Balamuta, HJJB LLC

Tuesday July 2, 2024 21:00 - 21:20 CEST

With the release of webR, the barrier of entry to use R on the web has been significantly lowered. Through embedding webR within Quarto, a novel form of report interactivity has emerged, giving rise to pseudo-web applications. These applications span the spectrum from concise tutorials to intensive edge computing scenarios. In this talk, we'll discuss the transformative paradigm shift within interactive data science facilitated by the Quarto extension {quarto-webr}.

Speakers

James Balamuta

Dr., HJJB LLC

Dr. James J. Balamuta currently serves as the founder of HJJB, LLC, which offers specialized data science guidance and solutions to startups, Fortune 500 companies, and academia across the U.S. He holds a Ph.D. in Informatics from the University of Illinois Urbana-Champaign (UIUC... Read More →

Tuesday July 2, 2024 21:00 - 21:20 CEST
YouTube Premier

Virtual Session Presentation, Reporting

21:30 CEST

Oops All Solvers: Democratizing Access to Water Treatment Models Using R - Sierra Johnson, Brown and Caldwell

Tuesday July 2, 2024 21:30 - 21:50 CEST