Big data and artificial intelligence concept. Machine learning and cyber mind domination concept in form of women face outline outline with circuit board and binary data flow on blue background.
The release of the ChatGPT model in 2022 by OpenAI excited the world and suggested to many that generative AI may power the next major expansion of the world economy.  Before the release of ChatGPT, AI and machine learning were already routinely being used, for example, to suggest products from Amazon for customers to buy based on their shopping history, and the ability to translate text from one language to another by Google translate.
However, with the all progress in AI, there are still many theoretical developments required. For example, Large Language models can use billions of parameters, that require huge amounts of energy to train, and sometimes produce hallucinations (just wrong and stupid answers) to questions. Perhaps simpler AI and machine learning techniques may be better that are trained on smaller curated data.  There are also important research directions and ethical issues, such as explainability in AI, and checking for biases in training data sets. Can the potential computational speed up from using quantum computers be used for AI?
This cross-cutting theme in theoretical foundations for data science and artificial intelligence aims to explore the above issues and questions.
 

Natural Language Processing

Telecom services are at the core of today’s societies’ everyday needs. The availability of numerous online forums and discussion platforms allows customers to provide feedback about the services often in the form of written comments (free text.) Natural Language Processing (NLP) tools can be used to process the free text collected. There are a number of potential uses for the free text. For example, features from the free text can be used in classifiers of customers to predict for example when a customer will cancel their subscription.  If the large amount of comments can be distilled to a few themes using Topic Modelling or summarization, the providers can improve their services by exploring feedback of their customers.
The data set included customer reviews from TrustPilot and other online sources of customer comments about Telecom services. A key part of the project is the use of word embeddings to convert the text to vectors. Nine state-of-the-art word embedding techniques were considered, including BERT, Word2Vec and Doc2Vec. To extract summaries clustering techniques and topic modeling were compared. The techniques from NLP allow a more sophisticated analysis than the use of word clouds to summarize free text.
Abdelmotaleb, H., Wojtyś, M. and McNeile, C., 2023. A Comparison of a Novel Optimized GSDMM Model with K-Means Clustering For Topic Modelling Of Free Text. Journal of Machine Intelligence and Data Science (JMIDS), 4(1), pp.52-62.

MSc projects related to natural language processing 

We also supervise MSc projects related to Natural Language Processing and its various applications, including: 
  • using a novel Parliamentary Rules database (parlrulesdata.org) to develop a method of automatic tracking of articles of parliamentary rules between years 1811 and 2022 using word embeddings and machine learning clustering methods;
  • a project in collaboration with Royal Cornwall Hospitals NHS Trust to analyse free text from staff survey results through Natural Language Processing;
  • leveraging Natural Language Processing to discover meaningful topics in board games’ user reviews;
  • a comparative analysis of word embeddings and their biases in empathy detection;
  • topic Modeling with Latent Dirichlet Allocation on mining journal to find out useful and meaningful insights.
Big data group