Big Data seminars

Find out more about our seminars

A group of business people sit in a row in a training class. They look at an unseen speaker as they concentrate on his lecture.

Big Data seminars

Find out more about our seminars

We run a series of seminars in statistics, data science and artificial intelligence.

Vine copula-based regression models

Speaker: Professor Claudia Czado (Technical University of Munich)

Date: Wednesday 24 April 2024, 11:00

Venue: Room 101, 2–5 KPL

Title: Vine copula-based regression models

Abstract: Vine copulas can also be used to construct flexible classes of regression models which can accommodate non linear non Gaussian dependence. This will include models for univariate and bivariate responses. For this the conditional distribution of the response given the covariates will be derived from a joint vine copula model of response and covariates. This provides a distribution regression model allowing for simple determination of conditional quantiles. The vine copula based regression models are constructed in such a way that the conditional density can be written explicitly without integration. This allows us also to develop a forward selection strategy to avoid overfitting. These approaches will be introduced and an application involving assessing risks in flight landings will be given.

Leveraging AI for Future Cyber Defence Research in BT

Speaker: Sri Kalidass (BT)

Date: Friday 22 March 2024, 11:00–13:00

Venue: DYB LT

Title: Leveraging AI for Future Cyber Defence Research in BT

Abstract: More than 15 billion devices are connected to the internet and that’s almost twice the world’s total population – 8.1 billion. The increase in devices, advances in technology (software & hardware), and easy access are not only benefiting humanity but also posing threats, especially due to the exponential growth and influence of AI. For instance, cybercriminals are posing more threats to nations across the globe and leveraging the latest technology. It’s predicted that cybercrime cost the global economy around $7 trillion in 2022.

At BT, we continuously evolve to defend and protect the network from cyber attacks by leading and developing world leading research. This talk is designed to give you a flavour of the following:

Future Cyber Defence Research in BT, how we leverage Artificial Intelligence (AI) for automated detection and response to cyber attacks
LLM-Fever & cyber landscape
AI regulation – global & co-ordinated effort!

A Gentle Introduction to Fine-tuning Large Language Models

Oscar Ponce (University of Plymouth)

Date: Thursday 14 December 2023, 11:00

Title: A Gentle Introduction to Fine-tuning Large Language Models

Abstract: In contrast to traditional Natural Language Processing (NLP) techniques, Large Language Models (LLMs) have achieved remarkable success in various NLP tasks including question-answering, summarization, name-entity recognition, and machine translation. Fine-tuning these pre-trained LLMs for specific tasks and domains enhances the model’s performance. This seminar will be about most-common fine-tuning techniques and the computational power requirements.

Topic Modelling in the era of Large Language Models (LLMs)

Bayode Ogunleye (University of Brighton)

Date: Wednesday 29 November 2023, 14:00

Venue: BGB 407, University of Plymouth

Title: Topic Modelling in the era of Large Language Models (LLMs).

Abstract: In this era of big data, understanding and distilling large volume of textual data remains a paramount challenge. Topic modelling (TM) stands out as a key method for the automated extraction of coherent themes across various applications including sentiment analysis and recommendation systems. It is considered a branch of machine learning (ML) and natural language processing (NLP), aiding in the revelation of underlying topical patterns. In this seminar, we will delve into the innovative paradigms for extracting textual themes and structure. Specifically, we will discuss the state-of-the-art (SOTA) algorithms including bidirectional encoder representations from transformers (BERT) and its variants for enhanced text representation and understanding. Additionally, we'll examine the role of TM for sentiment analysis, document summarisation and recommender systems.

Statistics vs the Apocalypse. Addressing real world questions in teaching

Professor Jim Ridgway (Durham University)

Date: 7 November 2023, 16:00–17:00

Location: Rolle 018, University of Plymouth

Title: Statistics vs the Apocalypse. Addressing real world questions in teaching

Abstract: Humanity faces some apocalyptic threats – climate change, epidemics and war. These existential threats dance together in unholy ways with simmering problems such as poverty, migration, inequality and racism. Further, the capacity for co-ordinated responses is under threat. The ecology of data is changing dramatically: yesterday’s data deluge is today’s millpond. Democracy may be under direct threat from misinformation and disinformation – some from government itself – and the use and misuse of AI. So, what can be done? A central concern is empowerment: can we develop a desire to know, the ability to find out, and the willingness to act on findings? Finding out requires knowledge about authoritative sources, and skills to interpret evidence appropriately. The good news is that there is a wealth of well-curated, authentic, multivariate data relevant to real world issues that is presented in ways that let users ask their own questions. The less-good news is that interpreting data can be hard. We will start by exploring and answering some fundamental problems in science that arise from a naïve application of statistics – namely, the reproduction crisis in psychology, and the problem of sample bias in medical science. The lecture will then explore some data sets with user-friendly interfaces appropriate for the classroom, on topics that include climate change and disease spread. We will discuss the statistical knowledge and interpretative skills needed to draw sensible and trustworthy conclusions. The message will be that engaging with authentic data on socially important topics is empowering for teachers and students, and can lead to better teaching and learning of statistics.

A Unifying and Flexible Copula Regression Modelling Framework

Giampiero Marra (University College London)

Date: Wednesday 7 June 2023, 13:00–14:00

Venue: Zoom

Title: A Unifying and Flexible Copula Regression Modelling Framework

Abstract: Motivated by the need in many fields to model joint outcomes, we present a unifying and flexible (bivariate) copula regression framework that is capable of handling peculiar shapes of response data via a vast range of marginal distributions, allows for a wide variety of copula dependence structures, and permits to specify all model parameters (including the dependence parameters) as functions of flexible covariate effects. Fitting the models within this framework can be a challenging task in practice. To this end, parameter estimation is carried out via a carefully structured algorithm based on a computationally efficient and stable penalised maximum likelihood estimation approach. The modelling framework is available via the R package GJRM which is very easy and intuitive to use. The methodology will be illustrated by discussing survival and health economics-related case studies.

Entrywise Preservers for Classes of Positive Matrices

Alexander Belton (University of Plymouth)

Date: Wednesday 17 May 2023, 15:00 (face-to-face)

Venue: Rolle 116, University of Plymouth

Title: Entrywise Preservers for Classes of Positive Matrices

Abstract: We all know the correct way to multiply matrices, but it is also possible to treat them as simple arrays of numbers and perform algebraic operations entry by entry. For multiplication, this is called the Hadamard product.

It may seem surprising but the Hadamard product preserves the collection of matrices that are positive semidefinite: those real symmetric matrices with non-negative eigenvalues. It follows immediately that applying any absolutely monotonic function entrywise also preserves this form of positivity. (A function is absolutely monotonic if its Maclaurin series has non-negative coefficients). Rather more work is required to show that the converse is true: a function which preserves positive semidefiniteness when applied entrywise to matrices of arbitrary size is necessarily absolutely monotonic.

The situation is more complex for matrices of a fixed size, or when the class of matrices under study has some other form of positivity or possesses additional structure, such as Hankel or Toeplitz matrices. This talk will discuss results for some of these situations.

This is joint work with Dominique Guillot (University of Delaware), Apoorva Khare (Indian Institute of Science, Bangalore) and Mihai Putinar (University of California at Santa Barbara and Newcastle University).

Pandemic Data Quality Modelling: A Bayesian Approach

Giancarlo Manzi (University of Milan)

Date: Wednesday 26 April,11:00

Venue: Rolle 115, University of Plymouth

Title: Pandemic Data Quality Modelling: A Bayesian Approach

Abstract: When pandemics such that of Covid-19 spread globally, the rapidly evolving situation compels officials and executives to give prompt responses and adapt policies depending on the current state of the disease. In this context, it is crucial for policy makers to have always a firm grasp on the current state of the pandemic, and to envision how the number of infections is going to evolve over the next weeks. However, as in many other situations involving compulsory registration of sensitive data, cases are reported erroneously, often with delays deferring an up-to-date view of the state of things. Errors in reporting new cases affect the mortality reporting, resulting in excess deaths from official statistics months later. We provide tools for evaluating the quality of epidemic mortality data. We accomplish this through a series of Bayesian models accounting for the excess mortality the epidemics might bring with respect to the normal level of mortality in the population.

Using linked administrative data to aid the handling of non-response and restore sample representativeness in cohort studies: the 1958 National Child Development Study and Hospital Episode Statistics data

Richard Silverwood (University College London), an Associate Professor of Statistics at the Centre for Longitudinal Studies within the UCL Social Research Institute

Date: 21 April 2023, 14:00–15:00

Venue: Zoom

Title: Using linked administrative data to aid the handling of non-response and restore sample representativeness in cohort studies: the 1958 National Child Development Study and Hospital Episode Statistics data

Abstract: There is growing interest in whether linked administrative data have the potential to aid analyses subject to missing data in cohort studies. Using linked 1958 National Child Development Study (NCDS) and Hospital Episode Statistics (HES) data, we applied a LASSO variable selection approach to identify HES variable which are predictive of non-response at the age 55 sweep of NCDS. We then included these variables as auxiliary variables in multiple imputation (MI) analyses to explore the extent to which they helped restore sample representativeness of the respondents together with the imputed non-respondents in terms of early life variables (mother’s husband’s social class at birth, cognitive ability at age 7) and relative to external population benchmarks (educational qualifications at age 55, marital status at age 55). We identified 10 HES variables that were predictive of non-response at age 55 in NCDS. For example, cohort members who had been treated for adult mental illness were more than 70% more likely to be non-respondents (risk ratio 1.73; 95% confidence interval 1.17, 2.51). Inclusion of these HES variables in MI analyses only helped to restore sample representativeness to a limited extent. Furthermore, there was essentially no additional gain in sample representativeness relative to analyses using only previously identified survey predictors of non-response (i.e. NCDS rather than HES variables). Since we are some of the first people to use this linked data resource, I will also take a detour into examining the linkage quality and sample representativeness.

Data Analytics at Royal Cornwall Hospitals NHS Trust

Brandon Chapman (Royal Cornwall Hospitals NHS Trust)

Date: 29 March 2023, 13:00-14:00

Venue: Room 101, 2–5 Kirkby Place, University of Plymouth

Title: Data Analytics at Royal Cornwall Hospitals NHS Trust

Abstract: This presentation will give an overview of what services the Information and Business Intelligence Department provides for Royal Cornwall Hospitals NHS Trust (RCHT) and across the local healthcare system. The different types of roles and skillsets within the department will be introduced. The newly formed data science/modelling teams’ function will be explained along with progress so far, giving some example projects we have completed or are currently undertaking. Finally, we’ll end the presentation discussing the types of projects that we would like to undertake in future with questions.

Using Electronic Health Records for Scientific Research: Promises and Perils

Bhramar Mukherjee (University of Michigan)

Date: 10 March 2023, 16:00–17:00

Title: Using Electronic Health Records for Scientific Research: Promises and Perils

Abstract: Electronic Health Records (EHR) linked with other auxiliary data sources hold tremendous potential for conducting real time actionable research. However, one has to answer two fundamental questions before conducting inference: "Who is in my study?" and "What is the target population of Inference?". Without accounting for selection bias, one can quickly produce rapid but inaccurate conclusions. In this talk, I will discuss a statistical framework for jointly considering selection bias and phenotype misclassification in analysing EHR data. Examples will include genome and phenome-wide association studies of Cancer and COVID-19 outcomes using data from the Michigan Genomics initiative and the UK Biobank. This is joint work with Lars Fritsche, Lauren Beesley and Maxwell Salvatore at the University of Michigan School of Public Health