Cross-cutting theme in theoretical foundations for data science and artificial intelligence

The release of the ChatGPT model in 2022 by OpenAI excited the world and suggested to many that generative AI may power the next major expansion of the world economy. Before the release of ChatGPT, AI and machine learning were already routinely being used, for example, to suggest products from Amazon for customers to buy based on their shopping history, and the ability to translate text from one language to another by Google translate.

However, with the all progress in AI, there are still many theoretical developments required. For example, Large Language models can use billions of parameters, that require huge amounts of energy to train, and sometimes produce hallucinations (just wrong and stupid answers) to questions. Perhaps simpler AI and machine learning techniques may be better that are trained on smaller curated data. There are also important research directions and ethical issues, such as explainability in AI, and checking for biases in training data sets. Can the potential computational speed up from using quantum computers be used for AI?

This cross-cutting theme in theoretical foundations for data science and artificial intelligence aims to explore the above issues and questions.

MSc projects related to natural language processing

We also supervise MSc projects related to Natural Language Processing and its various applications, including:

using a novel Parliamentary Rules database (parlrulesdata.org) to develop a method of automatic tracking of articles of parliamentary rules between years 1811 and 2022 using word embeddings and machine learning clustering methods;

a project in collaboration with Royal Cornwall Hospitals NHS Trust to analyse free text from staff survey results through Natural Language Processing;

leveraging Natural Language Processing to discover meaningful topics in board games’ user reviews;

a comparative analysis of word embeddings and their biases in empathy detection;

topic Modeling with Latent Dirichlet Allocation on mining journal to find out useful and meaningful insights.

Related research

Big Data Group

Big data in energy, environment and sustainability is closely integrated with the Big Data Group, who aim to contribute to the Big Data technological revolution by using it to address present and future societal and economic challenges.