Assessing the Quality of Human-Generated Summaries with Weakly Supervised Learning

Courselet Content

2 components

	Assessing the Quality of Human-Generated Summaries with Weakly Supervised Learning (video)	14 min
	Assessing the Quality of Human-Generated Summaries with Weakly Supervised Learning (pdf)	2.7 K

Requirements

None

General Overview

Description

This paper explores how to automatically measure the quality of human-generated summaries, based on a Norwegian corpus of real estate condition reports and their corresponding summaries. The proposed approach proceeds in two steps. First, the real estate reports and their associated summaries are automatically labelled using a set of heuristic rules gathered from human experts and aggregated using weak supervision. The aggregated labels are then employed to learn a neural model that takes a document and its summary as inputs and outputs a score reflecting the predicted quality of the summary. The neural model maps the document and its summary to a shared "summary content space" and computes the cosine similarity between the two document embeddings to predict the final summary quality score. The best performance is achieved by a CNN-based model with an accuracy (measured against the aggregated labels obtained via weak supervision) of 89.5%, compared to 72.6% for the best unsupervised model. Manual inspection of examples indicate that the weak supervision labels do capture important indicators of summary quality, but the correlation of those labels with human judgements remains to be validated. Our models of summary quality predict that approximately 30% of the real estate reports in the corpus have a summary of poor quality.

Recommended for you

MVA - Multivariate Distributions 3

Last Updated 15th January 2025

Generative Adversarial Networks

Last Updated 22nd September 2023

Free

Distribution based Trading Strategi...

Last Updated 16th June 2023

Nonstationary Time Series Models

Last Updated 7th July 2023

Free

Genetic Algorithm

Last Updated 17th December 2022

Chapter 26 - Implied Binomial Trees

Last Updated 16th January 2023

Free

Chapter 28 - Empirical Pricing Kern...

Last Updated 16th January 2023

Free

The present and future of cryptocur...

Last Updated 7th January 2023

Free

The Basics of Option Management 2

Last Updated 14th March 2025

Free

MVA Multivariate Statistical Analys...

Last Updated 20th August 2025

Free

Digital Transformation on Finance

Last Updated 27th February 2025

Free

Applied Time Series Analysis with P...

Last Updated 13th December 2022

Free

Introduction to Blockchain and Cryp...

Last Updated 7th November 2022

Free

Statistics of Financial Markets

Last Updated 21st March 2025

Free

Meet the instructors !

Arild Næss

About the Instructor

Arild Brandrud Næss is associate professor of statistics at NTNU Business School. His research focuses mainly on natural language processing (NLP), with particular emphasis on applications within finance and economics. Næss attained his MSc in Industrial Mathematics at NTNU’s Department of Mathematical Sciences and his PhD in speech technology from NTNU’s Department of Electronic Systems. He has also spent two years as a visiting researcher at Toyota Technological Institute at Chicago.

Student's feedback

4.7

Courselet Rating

100% `

80%

100%

Quantlet

Machine Learning

Digital Economy

Data Science

Cryptocurrency

Fintech

Blockchain

Explainable AI

Maths & Stats

Assessing the Quality of Human-Generated Summaries with Weakly Supervised Learning

Assessing the Quality of Human-Generated Summaries with Weakly Supervised Learning

Courselet Content

Requirements

General Overview

Description

Recommended for you

Meet the instructors !

Student's feedback