In the past, we (Aadirupa, Eyke and Viktor) have dealt a lot with problems of preference learning, even long before preference learning took large language model training to a new level. This field has now become so broad that, in principle, there is a preference-based counterpart for almost all categories that are also distinguished for “conventional” machine learning. For example, there are preference-based variants for the methodological categories (in terms of supervision) of unsupervised, supervised, or reinforcement learning, but also for specific settings such as active and interactive learning, online and bandit learning, adversarial learning, or learning with data streams, just to name a few.
Even if there are good overview articles or even books for some of these areas, a compact monograph that synthesizes these fields and uses uniform notation and terminology is lacking. We want to change this and create the basis for this project in a series of blog posts (inspired by all the excellent blogs on ML topics out there). The focus of our blogposts will be on getting to know the different preference-based learning subfields and understanding the core ideas, mathematics, as well as key design concepts of the current state-of-the-art algorithms for the respective field.
And, of course, we will also be covering the currently very hot topics that are related to preference learning such as AI alignment, Reinforcement Learning from Human Feedback (RLHF) and Largue Language Models. In other words, if you are interested in these topics, then you’ve come to the right place. That being said, we warmly invite also anyone interested in preference learning in general to follow this site, give us feedback through comments on these pages, ask questions, make suggestions for other topics, and please do not hesitate to criticize what we write.
We plan to update these blog posts on a regular basis. To avoid having to reload this page every time, we will notify you on our social media channels as soon as a new post is published. Therefore, you can simply follow one of the following accounts to stay up-to-date:
In the rest of this article, we will now give a little appetizer as to why someone should be interested in preference learning and look at a few examples of learning scenarios in which preferences play a central role.
Preference Learning – Why and how?
In the ever-evolving field of machine learning, staying at the forefront requires innovation. Machine learning models are only as good as the data they are trained on. The key to advancing this field is to make data collection not just more efficient, but more human-centered. There comes preference-based learning: A compelling case for embracing preference feedback in machine learning literature, a shift to revolutionize how we develop models. Incorporating preference feedback bridges the gap between the cold, calculated algorithms and the inherently subjective human experience. It empowers us to capture and integrate the richness of human preferences, making AI more intuitive and responsive to our needs.
Preferences have been studied for some time in various scientific disciplines, including economics and social sciences, operations research and decision sciences, psychology, and philosophy. It can be traced back to antiquity that people were engaged in thinking about their or someone else’s preferences1, and this interest has only intensified in modern times. In the following, we look at a few examples in which the modeling, elicitation or acquisition of preferences plays an important or perhaps a supporting role. Each of the examples is representative of a specific sub-field of so-called preference-based machine learning or preference learning for short, which in some cases has substantial overlaps with traditional fields of artificial intelligence research.
Election/ Polls Results (Rank Aggregation)
In fact, as early as the 14th to 15th centuries (and probably several centuries before that), scholars were concerned with how to determine voter preferences based on a ranked list of candidates2. The scenario primarily responsible for this can be illustrated by the following picture
Here, 5 people each rank 5 available political entities (candidates or parties) from first to last.Now, the question is how to aggregate these votes into an overall ranking that represents the votes appropriately, a so-called consensus ranking.
Nowadays, this problem is known as the rank aggregation problem3, which deals with the process of combining multiple ranked lists into a single representative ranking. Here, the entities that have been ranked can be anything from automobiles to zoo shops. The consensus ranking is basically a mean value for rankings instead of real-valued vectors, and just as there are several approaches for specifying a mean value for real-valued vectors (e.g., arithmetic mean, median, mode), there are different ways to do the same for rankings.
For example, we could simply assign a score to each rank: in the voting example above let us assume that
- rank 1 gets 4 points,
- rank 2 gets 3 points
- rank 5 only gets 0 points.
Across the votes, the entities receive scores and can ultimately be sorted according to the total number of points, which leads to a ranking that can be seen as a consensus ranking:
Another possibility would be to consider a distance function for rankings and to obtain a consensus ranking as a ranking that has the smallest total distance to all existing rankings. As there are now a number of distances for rankings, there are, in turn, a number of possibilities for obtaining a consensus ranking using this procedure.
Now, given this example, one might ask what this whole setting has to do with learning? In fact, the problem under consideration is more of a representation problem, but one that occurs in many actual learning problems, as we will see later. For example, considering the k-nearest neighbors algorithm, then we also use for a collection of data points (namely the k nearest neighbors around a query data point) an aggregation of the labels for the final prediction. This aggregation is in the case of classification the majority aggregation, while for regression typically the arithmetic mean is used. However, if our labels are now more complex objects as rankings, the question automatically arises as to how they can be aggregated.
Voting Groups (Clustering)
In the last example, it could be asked whether only a single consensus ranking can actually be suitable for representing the preferences of all voters. If the entire group of voters is fairly homogeneous in terms of preferences, one consensus ranking may be sufficient, but groups of voters are rarely homogeneous. It would be much more realistic to partition the entire heterogeneous group of voters into a finite number of homogonous sub-groups and specify a consensus ranking for each of them.
If we express the problem in ML language, we want to cluster the available votes, our data points, such that data points in the same cluster are more similar to each other than to data points in other clusters. Of course, the voting example is again only a specific clustering problem, and in practice, all kinds of other instantiations of this more general problem can occur quite naturally4. Basically, every time we have a collection of lists where certain entities have been ranked by humans or perhaps nature itself: Movie ratings, sports competitions, sushi ratings, … A collection of such data sets very similar to the well-known UCI Machine Learning Repository is Preflib.
Now, clustering is known to be only a part of the larger field of unsupervised learning in ML problems that is classically considered for more standard data such as real-valued vectors. However, since data in the form of preferences also have a certain structure, other subfields of unsupervised learning can also be considered for preference data, such as Association Analysis5 or Dimensionality Reduction6.
Search Engine Optimization (Learning to Rank)
With the introduction of special technologies in modern times, preferences have also sparked great interest in artificial intelligence (AI), as use cases for learning preferences have arisen almost automatically. Let’s take as a simple example the daily business of search engines. Here, a user enters a search term in textual form (aka submits a query) and the job of the search engine is to return search results (e.g. documents, websites, … ) that are deemed to be highly relevant for the entered search term.
The illustration above, for example, shows the (extremely simplified) situation where a user enters the search term “acoustic guitar” and there are 6 documents to choose from, each of which deals with a musical instrument, namely a saxophone, a banjo, an acoustic guitar, an electric guitar, a xylophone and a violin. The search engine now returns the 6 documents in order of expected relevance for the query. The ”true” relevance here results from the similarity of the respective musical instruments to the acoustic guitar. In the above case, the search engine returns the order according to the ”true” similarity.
The learning problem behind this is learning a preference function of all possible search items for a given query. The preference function indicates the extent to which one search item is likely to be more relevant than another with respect to the query. This can then be used to output an ordered collection of search items for an input query. The branch of the scientific literature that deals with such kind of problems is called Learning to Rank7, a fundamental part of Information Retrieval.
Image Recognition (Preferences for facilitating classic learning scenarios)
The above examples have so far been ”pure” preference learning tasks in the sense that the type of data available was already given in a specific preference representation. In the voting example, these were rankings and in the example of the search engine, these were relevance vectors (we will discuss the explicit representations in more detail in later blogs). Next, we will look at classic learning problems from machine learning, which at first glance has nothing to do with preferences. Classic here means that the actual target variable is either a class or a numerical value for which it is difficult to obtain exact explicit values. We will see that preferences can be used then as an additional source of information to support and facilitate classic learning tasks by providing a weak signal about the actual target variable.
Consider the case of classifying pictures of foxes, dogs and wolves:
It is not always easy to classify the partial images into “fox”, “dog” or “wolf”, as it is not always clear what the correct label is. Especially the distinguishment between dogs and wolves is in some cases quite difficult, as some pictures show characteristics of both.
However, if we instead of having to specify the labels, only have to say for each pair of images which of them fits more into one or the other category, we have what feels like a directly simpler task in front of us. Effectively, we would therefore be indicating a preference over pairs: I prefer one of the images more for the “dog” category than the other, as in the following picture:
In light of this, an interactive classification learning scenario can be considered in which two feedback modalities are available: noisy labels and pairwise comparisons8.
Such types of examples can also be found for regression tasks in addition to classification tasks. Imagine you are trying to infer the age of a person from their face image, e.g. for the following pictures:
Arguably a difficult undertaking for some pictures and you will certainly rarely guess the correct age. Again, if we look at pairs of pictures instead and indicate which person in the picture is older, the task is much easier. This can be helpful for a semi-supervised regression setting for obtaining additional information for unlabeled samples by pairwise comparisons between the samples9.
Another interesting application where preference-based information in the form of comparisons can be helpful is monocular depth estimation. This is a relevant task in the realm of computer vision and arises in subproblems such as 2d-to-3d conversions or 3d modeling.
Picture is taken from here.
It is quite obvious here again to regard the problem as a regression problem and to learn a depth map (see the gray images above). For this task, however, relative information at the per-pixel or per-object level can also be added to make the estimation potentially more precise. For example, pairwise comparisons of this form can be sampled from a depth map as additional training information, which has been done in several works for a more accurate depth estimation7,11 . In a similar way, an entire listwise ranking can also be extracted directly:
The picture is taken from here.
This blogpost was only a small appetizer for illustrating where we can encounter preference learning starting from quite old but still relevant tasks like choice theory to nowadays omnipresent tools like search engines. We have also seen how preference learning techniques can be of great help for classical learning tasks such as classification or regression.
Stay tuned for our next blog post, where we will discuss more interesting applications and combine them with existing ML fields such as reinforcement learning or large language models.
- Aristotle. Topics, Book III, 384-322 BC. URL ↩︎
- Günter Hägele and Friedrich Pukelsheim. Llull’s writings on electoral systems. Studia Lulliana, 41(97):3–38, 2001.URL ↩︎
- Shili Lin. Rank aggregation methods. Wiley Interdisciplinary Reviews: Computational Statistics, 2(5):555–570, 2010. URL ↩︎
- Thomas Brendan Murphy and Donal Martin. Mixtures of distance-based models for ranking data. Computational Statistics & Data Analysis, 41(3-4):645–655, 2003. URL ↩︎
- Sascha Henzgen and Eyke Hüllermeier. Mining rank data. In International Conference on Discovery Science, pages 123–134. Springer, 2014. URL ↩︎
- Mastane Achab, Anna Korba, and Stephan Clémencon. Dimensionality reduction and (bucket) ranking: A mass transportation approach. In Algorithmic Learning Theory, pages 64–93. PMLR, 2019. URL ↩︎
- Hang Li. A short introduction to learning to rank. IEICE Transactions on Information and Systems, 94(10):1854–1862, 2011. URL ↩︎
- Yichong Xu, Hongyang Zhang, Kyle Miller, Aarti Singh, and Artur Dubrawski. Noise-tolerant interactive learning using pairwise comparisons. Advances in Neural Information Processing Systems, 30:2431–2440, 2017. URL ↩︎
- Yichong Xu, Sivaraman Balakrishnan, Aarti Singh, and Artur Dubrawski. Regression with comparisons: Escaping the curse of dimensionality with ordinal information. The Journal of Machine Learning Research, 21(1):6480–6533, 2020. URL ↩︎
- Daniel Zoran, Phillip Isola, Dilip Krishnan, and William T Freeman. Learning ordinal relationships for mid-level vision. In Proceedings of the IEEE International Conference on Computer Vision, pages 388–396, 2015. URL ↩︎
- Ke Xian, Chunhua Shen, Zhiguo Cao, Hao Lu, Yang Xiao, Ruibo Li, and Zhenbo Luo. Monocular relative depth perception with web stereo data supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 311–320, 2018. URL ↩︎