Universal expressions of population change by the Price equation: Natural selection, information, and maximum entropy production

Abstract The Price equation shows the unity between the fundamental expressions of change in biology, in information and entropy descriptions of populations, and in aspects of thermodynamics. The Price equation partitions the change in the average value of a metric between two populations. A population may be composed of organisms or particles or any members of a set to which we can assign probabilities. A metric may be biological fitness or physical energy or the output of an arbitrarily complicated function that assigns quantitative values to members of the population. The first part of the Price equation describes how directly applied forces change the probabilities assigned to members of the population when holding constant the metrical values of the members—a fixed metrical frame of reference. The second part describes how the metrical values change, altering the metrical frame of reference. In canonical examples, the direct forces balance the changing metrical frame of reference, leaving the average or total metrical values unchanged. In biology, relative reproductive success (fitness) remains invariant as a simple consequence of the conservation of total probability. In physics, systems often conserve total energy. Nonconservative metrics can be described by starting with conserved metrics, and then studying how coordinate transformations between conserved and nonconserved metrics alter the geometry of the dynamics and the aggregate values of populations. From this abstract perspective, key results from different subjects appear more simply as universal geometric principles for the dynamics of populations subject to the constraints of particular conserved quantities.

To understand the dynamics of probability distributions, one must consider the forces and constraints that influence the change in populations. Many methods can be used to study dynamics. Here, I apply the Price equation, a highly abstract description of change in populations. The abstractness of the Price equation facilitates discovery and understanding of connections between seemingly different disciplines.
I use the Price equation to show the essentially identical basis for fundamental equations of natural selection, entropy, and information. I emphasize the first steps in how one might go about building a common framework in which to understand the similarities and differences between various disciplines. From this abstract perspective, key results from different subjects appear more simply as universal geometric principles for the dynamics of populations subject to the constraints of particular conserved quantities.

| OVERVIEW
This article provides the basis for unifying diverse subjects. Given the incompatible goals, methods, languages, and cultures of the different disciplines, it is useful to begin with an extended overview.
This overview serves only to orient in the direction of what follows, not as a complete summary unto itself. Readers who prefer to start with the details may wish to skip this section. Section 7 describes various identities and alternative partitions for the conservation of total probability. The different notational forms provide the basis for connecting seemingly different subjects to the common underlying geometric principles.
Section 8 considers frequency changes in relation to an abstract notion of force. By expressing frequency changes in terms of force, the Price equation partitions the conservation of total probability into two balancing components of change. The first component arises from directly acting forces with respect to a fixed frame of reference for the quantitative properties. The second balancing component of change arises from the inertial forces that alter the frame of reference.
The balance between the consequences of the direct and inertial forces provides an analogy to d'Alembert's principle of mechanics.
That connection establishes a first step in relating different disciplines to the common underlying geometric foundation.
Sections 9-11 transform the quantitative property of frequency change into logarithmic coordinates. In the canonical Price equation's partition of conserved total probability into direct and inertial components, the property of each type is its frequency change or growth rate, an analogy with biological fitness. In particular, the relative growth, or fitness, of the ith type is w i = q � i ∕q i , the ratio of the derived frequency, q ′ i , relative to the initial frequency, q i . The change between the initial and derived frequency can be considered as a path divided into segments, in which the overall growth, Sections 12 and 13 continue to set the geometric foundations for analysis. When we divide a path of change into many small segments, then we can think of overall change as the combination of many small instantaneous changes in response to directly applied force at each point along the path.
For small changes, the direct force at each point becomes approximately the same for the initial linear coordinates of change, w i , and the logarithmic coordinates, log w i , apart from a constant shift that does not alter the dynamics. The convergence of linear and logarithmic coordinates with respect to small changes explains the common forms of many fundamental results in different fields of study.
Section 14 develops two complementary abstract notions of force.
In the canonical expression of the Price equation for the conservation of total probability, the "fitness" term w i = q � i ∕q i simply describes the change in frequencies relative to the fixed frame of reference given by | 3383 FRANK Section 15 develops the deductive perspective by deriving the changes in frequencies for given initial frequencies and given forces.
The analysis applies the Lagrangian method, which maximizes the first component of the Price equation partition. That first component is an abstraction of the classical mechanics action term, as the virtual work of the direct forces with respect to a fixed frame of reference. The Lagrangian method generalizes the principle of least action.
The Lagrangian also includes various forces of constraint, such as the conservation of total probability, and any additional forces associated with other conserved quantities. The forces of constraint impose a limited set of potential paths that may be followed in the geometric space of frequency change. The actual path of change extremizes the action among those paths that are consistent with the forces of constraint.
Sections 16-18 present a partial maximum entropy production principle that follows from the dynamics of frequency change. To obtain this result, I partition the direct force into two components.
The first component becomes an additional force of constraint that expresses the invariance imposed by the conservation of some system quantity, such as energy or biomass or the direct change in some value.
The remaining component of the direct force is-log q i , which can be thought of as the entropy or information in the ith dimension.
The entropy becomes the action term maximized by the path of change, leading to a path that maximizes the production of entropy.
Because the maximization is taken with respect to the fixed frame of reference defined by the initial population, ignoring any inertial forces that alter the frame of reference, one can think of the entropy production as the result of a partial change holding constant the frame of reference-the partial maximum entropy production principle.
Sections 19 and 20 develop the notion of a conserved system quantity as a force of constraint. Jaynes maximum entropy analysis of thermodynamics and probability patterns follows as a special case of the general geometric principles of change in populations developed in earlier sections. From Jaynes' work and the later extensions of his theory to simple invariance principles, we have a unified framework in which to understand the relations between commonly observed probability distributions.
Section 21 discusses alternative ways in which to interpret maximum entropy paths. I argue that the most basic principles derive from the underlying geometry. Notions of entropy and information are simply interpretations of that geometry applied to particular disciplines of study.
Section 22 relates the path of change for populations to the Fisher information metric. That metric arises frequently in particular disciplines, including the fundamental approaches of information geometry.
Sections 23 and 24 briefly review key results. The Appendix provides brief histories of key topics and background references.

| SEPARATION OF FREQUENCY AND PROPERTY
The Price equation provides an abstract way in which to analyze changes in populations. The equation separates the frequency of entities from the property of those entities (Frank, 2012a;Price, 1972a).
Suppose, for example, that for entities with label i, we express frequency as q i and the average of the associated property value as z i . The z i values can be height, or energy level, or any quantity.
If entities with label i always have an average value, z i , then frequency change completely describes population change. If the change in frequency between two populations is Δq i = q � i − q i , then the change in the average value of z is in which the dot product, Δq · z, is understood in the usual way as the sum of the element-wise product of two vectors.
Alternatively, one may separate frequency from property. Thus, we have differences in frequency, Δq i = q � i − q i , and differences in property values, Δz i = z � i − z i . For example, a transportation planner might study the overall assessment of changing modes of transport in a population. The index i could label different transportation modes, such as automobile, train, and so on. The frequency q i is the fraction of individuals who travel by a particular mode. The quantity z i may be the relative assessment for the value associated with a transportation mode.
The separation of frequency and property allows a more general description of change. Changes in the total assessment of transportation can arise from changes in the frequencies of usage, Δq i = q � i − q i , and from changes in the assessment of value for each mode,

| SET MAPPING OF LABELS BETWEEN POPULATIONS
Our goal is to describe the change between two populations. We may arbitrarily label one population as the ancestor and the second population as the descendant. The general formulation concerns only the differences between populations, independently of any particular underlying scale of separation, such as space or time or updating in light of new evidence. In this section, I consider the example of separation between populations by time.
people become train travelers only by learning about trains from someone who already travels by train; an individual train traveler maps to self as a descendant train traveler. In this case, each descendant train traveler maps to a train traveler in the ancestral population. Positive Δq i reflects growth of the ith class by successful recruitment.
In a second interpretation, we could map descendant individuals to their mothers. Then, Δq i has to do with the number of babies produced by each mother. In this case, a descendant's label i is defined only by ancestral type. Descendants do not have their own types, only their mapping to an ancestral i.
We handle the fact that descendants may use travel modes that differ from their mother by adjusting the change in property value, For mothers who travel by train, with property value z i , their descendants have some average property value, z ′ i , that accounts for both changes in travel mode by descendants and changes in property value associated with each travel mode.
In the general, abstract interpretation, the label i applies only to the initial, or ancestral set. All entities from the second, or descendant, population map to ancestors, and thus derive their labels from their ancestors. We can use partial assignments, so that a descendant is made up of various fractions of ancestors, each descendant part accounted for separately by its assignment to an ancestral label, i.
At first glance, this set mapping abstraction may seem rather complicated and obscure. However, its great power arises from the fact that nearly all studies of changes in populations can be described by specific mapping assumptions and associated interpretations. Thus, anything that we can prove about the general abstract setup applies to the very many apparently different special cases that arise in different applications.

| THE PRICE EQUATION
The Price equation (Frank, 2012a;Price, 1972a) describes the change between two populations in the aggregate value of some property (this section is modified from Frank, 2015). Each component of the population has a frequency weighting, q, and a property value, z.
Begin with a discrete analog of the chain rule for differentiation of a product in which q′ = q + Δq and z′ = z + Δz. The same chain rule can be applied to vectors. Using dot product notation, we obtain an abstract form of the Price equation (Frank, 2012a(Frank, ,b, 2013 in which a dot product is understood in the usual way as q · z = ∑ q i z i . This equation can be interpreted in various ways, as discussed in prior sections. In general analysis, I adopt the most abstract interpretation with regard to set mapping between two populations. Roughly speaking, we can take q i to be the frequency associated with a subset, i, of the initial population, such that the total frequency is ∑ q i = 1.
Thus, z = ∑ q i z i is the average of z.
Here, z i is an arbitrary function that maps i to some property value, and z i is interpreted as the average of z in each dimension or subset, i. Because z can be any quantity, calculated in any way, this equation gives the most general expression for Δz, the change in the average of z. One can think of z = ∑ q i z i as a functional of the arbitrary function, For a second population, with frequencies q ′ i and values z ′ i , we have ∑ q � i = 1, in which the primes denote the abstract mapping described in the prior section. Our only restriction is that we can map the index i between the two populations. We may define the average value in the second population as

| BIOLOGICAL FITNESS AND THE CONSERVATION OF TOTAL PROBABILITY
We may define an abstract analog of biological fitness. For a type or subset with label i, comprising frequency q i in the ancestral population, the fraction of the descendant population derived from i is q ′ i . Thus, the relative success of type i in contributing to the descendant population may be written as its relative fitness Average relative fitness is always one because the total frequency or probability is always a conserved value of one. In some articles, w i is taken as an absolute measure of the number of descendants assigned to type i, and w is the average number of descendants, which may differ from one. In that case, w i ∕w is relative fitness. Here, I am using w i as the measure of relative fitness, with w always equal to one. The following analysis does not differ under the alternative definitions, but it is important to keep in mind the distinct definitions that may be used.
If we use relative fitness for the abstract property in the Price equation of Equation (2), with z ↦ w, we obtain It is often useful to express fitnesses as deviations from their average value, which we obtain by subtracting one from relative fitness which is known as Fisher's average excess in fitness (Fisher, 1958).
The average value a is always zero; thus, we can write Equation (4) as

| IDENTITIES FOR THE CONSERVATION OF PROBABILITY
We may express the conservation of total probability in a variety of equivalent forms. This section shows some of the variants. The purpose of these variants is to set up the discussion in the next section, in which we interpret the Price equation partition in Equation (6) as a partition of total change into two parts. The first part is the change ascribed to direct forces, F. The second part is the change ascribed to the altered context of the population, which may be thought of as a change in the frame of reference caused by inertial forces, I.
I will discuss the interpretation of direct and inertial forces in the next section. Here, we must first consider various notational manipulations, which by themselves do not have much obvious meaning.
The goal will ultimately be to discuss general aspects of change in populations subject to the constraint set by the conservation of total probability, which allows us to write the Price equation partition in Equation (6) as We will need a toolkit of notational variants to establish this form and to show the connections between seemingly different subjects.
It is a bit tedious to set up the various notational identities, but it is important to do so to develop alternative interpretations and to avoid confusion. On first reading, one may wish skim quickly through this section and then refer back to the notations as needed.
To start, note that q′ = q + Δq and Δa = a′ − a, thus we can write the second term of Equation (6) as because q′ · a′ = q · a are the average values of a, which are always zero. Thus, we end up with the seemingly trivial partition which we will nonetheless find quite useful, because the partition provides some hints about the balance of direct and inertial forces in a conservative system. Before turning to that balance of forces in the next section, it is useful to consider some additional identities.
Each term in Equation (9) expresses the variance in fitness and, equivalently, a measure of the squared Euclidean distance through which the population moves in which a 2 is the vector of the squared terms, a 2 i , and thus, q · a 2 is the second moment of a. Here, V w is the variance in relative fitness, because a i = w i − 1 is relative fitness shifted so that the mean value of a is zero. Thus, the second moment of a is the variance.
The term q · a 2 can be thought of as a squared distance starting from an initial point at zero and moving through the distance given by the sum of the squared deviations in each dimension, a 2 i , each dimension weighted by its frequency, q i . Thus, the distance that the population moves in frequency space, caused by the changes in frequency given by variable fitnesses, is equivalent to the variance in fitness. Put another way, the reason that the variance in fitness always arises as the key metric in population change is that the variance describes the distance that the population moves.
We can also write which are forms that arise in information theory interpretations of frequency changes, and also clarify the geometric squared distance interpretation of frequency changes (Amari & Nagaoka, 2000). We can write this equation in a nonstandard vector notation, which will be convenient to use in this article, as in which a ratio of vectors implies element-wise division, and vectors distribute through parentheses as dot products.
We can also rewrite the second term of Equation (6) by rearranging Equation (8) as in which which measures the nonlinearity, or bending, in the changes of q in subsequent steps, which is roughly like an acceleration.
Note that Equation (11) has Δq i terms in the denominator, which may appear to be problematic when such terms include zero values.
However, each term is always part of a dot product, yielding values of Δq * i for each term; thus, we can always interpret such terms directly by their actual value. The reason for splitting the terms in the manner of Equation (11) follows at the end of this section.
Note also that by the conservation of total probability. However, in each individ- Although the total value is constrained to be zero, it is often useful to retain this term to emphasize the fact that the values in each dimension can vary.
We can combine the various pieces to express the Price equation partition for the change in relative fitness in Equation (6) as or, using a = Δq/q, as The second form emphasizes that this expression is given purely as the nondimensional description of changes in frequency or probability. Later, it will be useful to drop the middle term using the identity in Equation (12), leading to the form in Equation (9) expressed as

| BALANCE OF DIRECT AND INERTIAL FORCES
The previous sections described the conservation of total probability, which imposes strong constraints on the geometry of change in populations. In particular, the dynamics of probability distributions must move along the constraint that the total probability remains unchanged. Within that constraint, the probability distributions that characterize populations may change in response to directly applied forces, such as biological fitness or physical forces or informational processes.
This section analyzes the changes in probability distributions in response to direct forces and subject to the constraint of conserved total probability. The previous section established the key equations. On the abstract side, Equation (7) presented the partition between the forces that directly change frequencies, F, and the forces that change the inertial frame of reference for the population, I, as which expresses a nondimensional analogy of d'Alembert's principle with respect to the balance between the direct and inertial components (Lanczos, 1986). d'Alembert's principle describes classical physical laws of motion in systems that conserve total energy, for example, motion that does not lose energy by friction and dissipation of heat.
I previously discussed d'Alembert's principle in the context of frequency changes in populations (Frank, 2015). Here, I repeat a few key points from my previous article.
The term F is the vector of direct forces acting on the system, and the term I is the vector of inertial forces that balance the direct forces to achieve no net change. d'Alembert's principle can be thought of as a generalization of Newton's second law of motion (Lanczos, 1986), in which F = μÃ is read as the total force, F , equals mass, μ, times total acceleration, Ã . Total force and total acceleration must include forces of constraint, which in our case means that Σ Δq i = 0. If we write total inertial force as Ĩ = −μÃ, then Newton's law is F +Ĩ = 0.
In d'Alembert's formulation, the direct and inertial forces typically do not sum to zero, F + I = 0, because those terms do not include the constraining forces that act on Δq. Instead, in d'Alembert's expression (F + I)Δq = 0, the term Δq · F combines the direct and constraining forces, and the term Δq · I combines all inertial forces, including any forces of constraint. Newton's law is a special case of the more general principle of d'Alembert (Lanczos, 1986).
Here is a simple intuitive description of d'Alembert's principle (Wikipedia, 2015). You are sitting in a car at rest, and the car suddenly accelerates. You feel thrown back into the seat. But, even as the car gains speed, you effectively do not move in relation to the frame of reference of the car: Your velocity relative to the car remains zero. That net zero velocity can be thought of as the balance between the direct force of the seat pushing on you and the inertial force sending you back as the car accelerates forward.
As long as your frame of reference moves with you, then your net motion in your frame of reference is zero. Put another way, there is a changing frame of reference that zeroes net change by balancing the work of direct forces against the work of inertial forces. Although the system is a dynamic expression of changing components, it also has an overall static, equilibrium quality that aids analysis. As Lanczos (1986) emphasizes, d'Alembert's principle "focuses attention on the forces, not on the moving body…" In terms of explicit notation for changes in frequencies, the previous section developed a Price equation expression for the partition of direct and inertial forces in Equation (14) as with analogy to d'Alembert's form by expressing direct and inertial forces as For frequency changes, one can think of a coordinate system that locates a population as a point defined by the population's frequency or probability distribution. The direct work done to move the population in that coordinate system is Δq · F, the sum of the force multiplied by the displacement in each dimension, calculated when holding constant the frame of reference defined by the coordinate system. That direct work is balanced by the inertial work done to accelerate the reference frame coordinate system by a total amount Δq · I, which relocates the altered population and its associated forces so that it appears in the new frame of reference to have a net total displacement multiplied by force of zero.
I use the word "force" here in an abstract, nondimensional manner, rather than in the specifically defined manner of classical physics. Such words can be a barrier to interdisciplinary insight and understanding.
Readers highly trained in particular disciplines, such as physics, sometimes believe that a word such as "force" has a single correct meaning and associated units of expression. Any variant use of the word is thought to be misleading or mistaken. I take the opposite view. The underlying nondimensional geometry expresses the purest abstract notion of such concepts.
In each separate discipline, the particular dynamics and related equations have terms that take on specific interpretations, units, and meaning. Those specific aspects arise from the application of the same underlying universal geometry to particular problems, which usually means the same underlying conserved quantities and associated symmetries. The same geometry and abstract concepts will take on different units and interpretations in different disciplines.

| AVERAGE FORCE ALONG A PATH
In the Price equation description of change, we have only the differences between two populations. The two populations describe the initial and final probability distributions, q and q′. Each distribution can be thought of as a single point in a space of probability distributions.
The separation between the two points is a nondimensional change that can be small or large. There is no underlying parameter, such as time or spatial distance, that defines the scale of separation and the path of change that connects the points.
Most applications analyze changes along a path with respect to an underlying parametric scale. To relate the Price equation to other theoretical frameworks, it is useful to add an abstract notion of change along a parametric path that connects the initial and final probability distributions.
Let θ be a parameter that describes change along a path that connects q to q′ in which Δθ = θ − θ 0 . We can set θ 0 = 0 and thus write θ ≡ Δθ. For notational convenience, let the dependence of q(θ) on the parameter θ be implicit, so that we can write the same expression more simply as We can think of r i as the average force acting along the path that moves the system from q i to q ′ i with respect to total path length, θ = Δs 2 , in the parametric length scale, s. Thus, r i θ is the total force in the ith dimension along the path of change. For our purposes, we can treat s as a nondimensional scale, and think of r i as having nondimensional units of 1/s 2 , interpreted as a nondimensional force or acceleration. In biology, the force r i is interpreted as the Malthusian expression of biological fitness in analyses of natural selection, connecting the abstract analysis here to models of biological evolution (Frank, 2015).
Note that So that we may think of r i as the average change in logarithmic coordinates of probability with respect to changes in the parametric length scale Δθ = Δs 2 .
We can express the total nondimensional force in these logarithmic coordinates acting along the path of change from q to q′ as Because m i = log w i , we can think of m i as log fitness. Using m i to express fitness, or force, the expression for change along a path in Equation (17) becomes

| COMPARING LINEAR AND LOGARITHMIC COORDINATES
In linear coordinates, for each implicit i, we combine forces multiplicatively in which q separates (q,q ′ ) into the segments (q,q) + (q,q � ), with q between q and q′.
In logarithmic coordinate, we combine forces additively The two coordinate systems describe the same total fitness, or force, as We can decompose any fitness value and its associated vector, (q, q′), into a large number of small pieces. In principle, we could analyze large changes in frequency, Δq = q′ − q, by combining the changes along each small segment in a decomposition of total change.

| LOG COORDINATES, ENTROPY AND INFORMATION
The average value of log fitness is in which is the Kullback-Leibler divergence (Cover & Thomas, 1991;Kullback, 1959). This divergence measures relative entropy by extending the classical measure of entropy, −q · log q, for a probability vector q, to a measure of the entropic divergence of q relative to a given probability vector, q′.
One can think of classical entropy for a probability vector, q, as a special case of the more general relative entropy by comparing q to a uniform distribution described by a constant probability vector in which q � i = 1∕N for all i. The Kullback-Leibler divergence is also a primary measure of information in statistics and information theory.
The properties of entropy and information derive from the fundamental geometric properties of logarithmic coordinates, such as the additivity described in the previous section.
From the equality above, m = − (q||q � ), we can write the change in mean log fitness as which measures the bending, or curvature, of the divergence between the populations in the sequence q → q ′ → q ′′ . When the divergence between successive steps remains constant, then mean log fitness is invariant.
We can use the Price equation in Equation (2) to partition the total change in log fitness into direct and inertial components The direct component is in which is the Jeffreys divergence. In earlier work, I showed that the Jeffreys divergence is the proper expression for the direct component of change caused by natural selection or, more generally, the component associated with direct forces when evaluated with respect to the fixed frame of reference given by the initial probability vector (Frank, 2012b).
For small changes,  and  converge to the Fisher information metric. Thus, analyses of small changes often invoke  ,  or Fisher information without distinguishing between the measures. For small changes, the Fisher information metric is often preferable, because it has many useful geometric properties (Amari & Nagaoka, 2000) and is more widely known than  . However, it is useful to keep in mind that, in general,  is the correct measure for the direct effect of natural selection, or for the direct component of change relative to a fixed frame of reference.
The inertial component is

| SMALL CHANGES: PRELUDE
In the remainder of this article, I focus only on the small changes that arise from forces acting at a given point. Small changes correspond to a single small segment in any larger path. I focus on small changes for two reasons.
First, the conceptual relations between different disciplines can be seen mostly clearly in small changes around a focal point.
Second, analysis of larger changes requires either an assumed constancy of a force field, or potential function, or an explicit notion of how forces change with both time and the changing context of the population. Those required assumptions reduce the generality of any particular formulation and obscure the common conceptual basis of different subjects.
In the future, it would be useful to extend analysis to cases in which there is no meaningful decomposition of a large change vector into small segments and to cases in which there exists a constant force field for which one could reconstruct the path of change over a sequence of small segments. Such extensions exist within individual disciplines, but it remains unclear how to connect the analyses from those different subjects to a common unifying framework.

| SMALL CHANGES: ANALYSIS
When changes Δq i = q � i − q i are small, I use the notation Δq i → dq i ≡q i . For linear coordinates, we may write and for logarithmic coordinates when q i ∕q i is small, we may write Because the consequence of forces is shift invariant in expressions such as the linear and logarithmic expressions of force, w and m, are equivalent for small changes. We may express this equivalence explicitly by noting that, in general, the direct component of change was given earlier as which, when q i ∕q i is small, we may write as This last expression is the Fisher information metric, which arises as the direct component of population change or natural selection (Frank, 2009), the limiting expression of the Jeffreys divergence given earlier.

| GIVEN FORCES
I have defined m i = log q � i ∕q i →q i ∕q i as proportional to the force acting along the infinitesimal change q i = q � i − q i . These expressions describe a consistency relation between force and frequency change. Often, we wish to consider how extrinsic or given forces cause change, rather than simply express consistency.
Suppose, for example, that we have a given force vector acting at the point in frequency space, q. The given force is the nondimensional vector Given the location, q, and the force vector, ̂ , the vector q provides an alternative way to express the intensity of the force vector as log q∕q. We can multiple q by an arbitrary positive constant, because the net consequences of a force vector are shift invariant. Thus, we may implicitly consider cq as the target and choose q to sum to one, satisfying the conservation of total probability.
As with m, we can write the total nondimensional force as a description of an exponential growth process in which q i is the endpoint of the exponential growth process that began at q i . Thus, the location q and the "target" location q are sufficient to describe the given force vector. In the following, we will only be interested in small changes, q, that result from the instantaneous given forces with respect to a fixed frame of reference. One goal will

| 3389
FRANK be to find the changes, q, that arise from given forces and various constraints on change.
It is common in classical mechanics to define force, φ i , in relation to coordinates, q i , by the negative gradient of a potential function Φ, which for our definition of ̂ leads to We can use the potential function in which the second term expresses the constraint on total probability, so that the resulting force includes the force of constraint. The average force, φ = q ⋅̂ = − (q||q), is also a relative entropy expression.

| EXTREME ACTION AND FREQUENCY DYNAMICS
The given forces and the conservation of total probability do not by themselves tell us what frequency changes occur. In the study of frequency changes, the simplest variational approach (Lanczos, 1986) finds the extremum (maximum or minimum) of a Lagrangian subject to a constraint. In our case, we may write in which we take as given the direct force in each dimension, φ i .
We measure the total change caused by the direct forces as q ⋅ m = ∑q i m i = ∑q 2 i ∕q i . That expression comes from Price's separation of direct and inertial forces in Equation (19). In terms of classical mechanics (Lanczos, 1986), the expression q ⋅ m is the virtual work of the direct forces, in which work is distance times force (ignoring mass).
Geometrically, we can think of the constraint in the second term as fixing the total path length moved in frequency space (Amari & Nagaoka, 2000), in which ∑q 2 i ∕q i = C 2 measures distance by the Fisher information metric for infinitesimal displacements, q, or, biologically, C 2 is the variance in fitness. I assume that C 2 is chosen so that a solution exists that satisfies the constraints. The final term constrains total probability to remain constant.
The constraints of ∑q 2 i ∕q i = C 2 and ∑q i = 0 do not by themselves determine which frequency changes actually occur. Many different frequency vectors, q, satisfy those two constraints.
Given these forces and constraints, what actual path do the dynamics follow? In other words, what is the realized vector q? We can think of the first term in the Lagrangian as the action, and extremize the action subject to the given constraints (Lanczos, 1986). That action term is q ⋅̂ , the product of the displacement times the given force, which is the virtual work. In this case, maximizing the virtual work in the Lagrangian finds the displacement q aligned with the direct and constraining forces.
To find the extreme action path, we evaluate ∂∕ ∂q i = 0, which yields in which φ * i =φ i −φ is the excess force relative to the average, and ξ =φ = ∑ q iφi follows from satisfying the conservation of total probability and the assumption that the virtual displacements are small.
The constant of proportionality satisfies the constraint on total path length, in which σφ is the standard deviation of the direct forces.
Here, we have deduced a fundamental expression for frequency dynamics by the principle of extreme action. We can rewrite the expression for frequency dynamics as which shows that the forces, m i , may be arrived at inductively by consistency with given changes, q i ∕q i . This expression also shows that the forces described by m are related by affine transformation to a vector of given forces, ̂ , from which one may deduce the actual frequency changes.

| DIRECT FORCES AND CONSTRAINING FORCES
The distinction between direct and constraining forces is arbitrary.
We may choose to describe a force by its constraint on allowable displacements, q, or by its inclusion in the direct forces, ̂ ≡ F. (23) defines the action to be extremized as the work done along the path, which is the total displacement, q, times the direct component of force, ̂ . We can use ̂ rather than ̂ * =̂ −φ for force, because we can ignore the constant,  , and q ⋅φ = 0.

The Lagrangian in Equation
The constraining forces in the Lagrangian of Equation (23) are the fixed path length, ∑q 2 i ∕q i = C 2 , and the conservation of total probability, ∑q i = 0. We are free to relabel a component of the direct force as a constraining force (Lanczos, 1986). In practice, deriving the altered Lagrangian provides an easy way to see how the changed labeling of direct and constraining forces enters into the analysis.

Consider the direct forces as defined in Equation (21) as
We can think of this expression as the sum of two component forces, log q and -log q. The virtual work term of the direct forces becomes We may choose to relabel q ⋅ logq as a force of constraint. The remaining term −q ⋅ log q becomes the virtual work associated with the direct forces. The next section illustrates how this change in labeling can be useful.

| CONSERVED SYSTEM QUANTITIES AS THE PRIMARY FORCES OF CONSTRAINT
In relabeling logq as a constraining force, we may write in which log k is understood to be a constant vector with elements k when used in a vector context, k is chosen so that ∑q i = 1 obeys the conservation of total probability, the term λ is a positive constant, and z i > 0 is chosen to make the equality hold. Thus, we can express the force associated with q i using z i . The constraining force now becomes associated with the component The advantage of using z is that we may define the force of constraint directly in terms of any system quantity that we may associate with z. Each z i is, in this analysis, a given value associated with a subset i of the population. We can use any quantity for z, including energy or momentum or monetary wealth or a quantitative biological trait.
Often, underlying quantities of a system, x i , become transformed by various processes before we evaluate the final quantity of the outcome, z i . We may, in general, consider z i = T(x i ), in which x i is an intrinsic quantitative value associated with the subset i, and T(x i ) is a transformation that defines a scaling relation between the intrinsic x i values and the constraining force, z i . The analysis of pattern often reduces to understanding the processes that set the scaling relation (Frank, 2014), T.
Because we can define z i = T(x i ) in any way, the quantity z = q ⋅ z can represent almost any sort of functional on the system. This expression for z is also the average value of z. It is often useful to consider changes in z, with infinitesimal change as which we obtain by a simple chain rule expansion of the differential, yielding an infinitesimal expression of the Price equation given in Equation (2).
If q ⋅ z is constrained, then that constraint defines the constraint on q in Equation (29). For example, the total system quantity z may be conserved, which means that ̇z = 0. If the z quantities do not themselves change, then q ⋅̇z = 0, and consequently, we have the constraint on the given forces q ⋅ z = 0. We may also consider other ways in which q ⋅ z is constrained, thereby defining the given forces q that determine dynamics.

| MAXIMUM ENTROPY PRODUCTION PRINCIPLE
With the split between direct and constraining forces in Equation (27), and the expression of the constraining forces in terms of z in Equation (29), we can write a new Lagrangian that is equivalent to the Lagrangian in Equation (23), using dot product notation The first term is the total action to be maximized, which is the virtual work of the direct forces, q ⋅ F = −q ⋅ log q. The other terms describe the constraints on the path that q may follow. I assume that C 2 and B are chosen such that a solution exists.
The classical definition of entropy is − q · log q. Thus, the path q that maximizes q ⋅ F = −q ⋅ log q, subject to the constraints on q, is, in the limit of small changes, the path that maximizes the production of entropy subject to the constraints-the maximum entropy production principle (see Appendix for references).
The idea is that the most likely path is the one that maximizes the production of entropy, which is equivalent to the maximization of the virtual work of the direct forces, q ⋅ F = −q ⋅ log q, subject to the constraints on q. The constraints in q include all forces that determine the location of logq = log k −λz.
The maximum entropy production principle is always true, in the sense that one can always split the total direct forces, ̂ , into a constraining component, log q, and a direct component, −log q. The extent to which maximum entropy production is meaningful depends on two questions. First, how meaningful is it to treat logq = log k −λz as a constraint? Second, how meaningful is it to consider paths of change in the context of the Price equation separation of direct and inertial forces, a generalization of d'Alembert's principle?
In order to answer those questions about maximum entropy production, the next section analyzes dynamics with respect to z as a constraint. The following section discusses the Jaynesian theory of maximum entropy in relation to equilibrium thermodynamic expressions for common probability distributions. After those two sections, I return to the broader question of how to interpret the maximum entropy production principle in terms of the Price equation.

| MAXIMUM ENTROPY PATH SUBJECT TO CONSTRAINT
To interpret the meaning of z as a constraint, we return to the Lagrangian in Equation (31). That Lagrangian is equivalent to the form in Equation (23)

FRANK
The term β ɛz is the regression coefficient of  i , on z i , which transforms the scale for the forces of constraint imposed by z to be on a common scale with the direct forces of entropy, −log q. The term B∕σ 2 z describes the required force of constraint on frequency changes so that the new frequencies move z by the amount q ⋅ z = B. The term σ 2 z is the variance in z.
When the z values change, the changing frame of reference with respect to z follows from Equation (30)

| EQUILIBRIUM THERMODYNAMICS AND PROBABILITY
This section analyzes how the system equilibrium arises from the direct force causing maximum increase in entropy and the constraining forces imposed by z. That equilibrium can be interpreted as the maximum entropy probability distribution.
The dynamics are expressed in Equation (24) as q i = q iφ * i . Equilibrium requires that the forces be constant in each dimension, thus φ * i = 0. We can take that condition as the forces in each dimension given by which means that the equilibrium condition can be written as log q i = logq i . We can express q i in terms of the system quantities, z, that set the forces of constraint. From Equation (28), we write the equilibrium condition as log q i = log k −λz i , or That probability distribution is the classic Jaynesian thermodynamic equilibrium (Jaynes, 1957a(Jaynes, ,b, 2003 that arises by maximizing entropy subject to a constraint on z. That constraint is usually interpreted as a conserved quantity, such that ̇z = 0, and q ⋅ z = q ⋅̇z = 0. We can use multiple constraints on a set of system values z j , and replace λz i by Σλ j z ij summed over j. For simplicity, I focus on a single constraint.
Suppose we want to find a Lagrangian that leads to the Jaynesian equilibrium, in which the defined forces q arise from a constraint on a conserved system quantity, z = q ⋅ z = μ. The following Jaynesian Lagrangian does the job in which  = − ∑ q i log q i , is the classical expression for entropy defined earlier. This Lagrangian is simply the entropy, , subject to two constraints. First, the total probability must be one. Second, the system quantity z = Σq i z i is conserved and equal to μ. The terms k and λ are the Lagrangian multipliers that adjust to guarantee that the constraints are satisfied.
Maximum entropy subject to the constraints requires ∂∕ ∂q i = 0, which yields the maximum entropy probability distribution in which log k =k − 1, and λ = 1∕μ. We can extend this result to unify the commonly observed probability distributions within a single framework by noting that z i = T(x i ) is an arbitrary scaling relation of an underlying value, x i (Frank, 2014(Frank, , 2016. Two conclusions follow. First, equilibrium probability distributions at maximum entropy express the force of constraint on total probability and the forces of constraint on total system quantities. The point of maximum entropy occurs at the minimum relative entropy,  (q||q), which is achieved as q → q.
Second, pattern follows from the values of z that set the forces of constraint and thus the magnitudes of q. How the z values arise has not been specified. Thus, the study of pattern often reduces to the study of how various processes set z. The analysis here clarifies how those processes and the associated maximum entropy probability distribution relate to the universal Price equation expression for the dynamics of populations.

| INTERPRETATION OF MAXIMUM ENTROPY PATH
The previous sections analyzed forces in terms of Price's partition of direct and inertial forces, an abstract generalization of d'Alembert's principle of mechanics. By analogy with d'Alembert's principle, the Price equation term q ⋅ F can be thought of as an abstraction of the virtual work associated with the direct and constraining forces.
The direct forces are F. The constraining forces are included in the allowable set of displacements, q, taken relative to the fixed frame of reference. Such displacements relative to a fixed frame of reference are sometimes called virtual displacements, thus the name virtual work for the term q ⋅ F. The Lagrangian expressions provide a method for maximizing the virtual work subject to the constraints that limit the possible set of displacements.
We may interpret the partition of direct and constraining forces in different ways, to match the interpretation of different problems.
In this article, I split the total direct forces into a direct force that increases entropy, F = − log q, and a set of potential virtual displacements, q, that obey the forces of constraint defined by conservation of a functional, z, of the system quantities, z, where one can think of each z i as a function on the subset, i, of the population.
In particular, I defined the total direct forces by ̂ = logq∕q, and then split those forces as If we take ̂ as the direct forces, then the frequency changes can be obtained from the Lagrangian in Equation (23) that maximizes the action q ⋅̂ , which is equivalent to minimizing the change in relative entropy, (q||q).
If we take -log q as the direct forces, then the frequency changes can be obtained from the Lagrangian in Equation (31) that maximizes the action −q ⋅ log q, which is equivalent to maximizing the gain in entropy, .
In other words, the realized path maximizes the production of entropy when analyzed within the fixed frame of reference, thus the maximum entropy production principle. That conclusion holds only in the d'Alembert-Price distinction between direct and constraining forces, in which we choose to interpret all direct forces except entropy production as constraining forces on the possible virtual displacements, q. In addition, the changes in frame of reference that typically arise from change in location, q, or from change in the constraining forces, are separated by the Price equation approach into the consequences of the inertial forces.
Maximum entropy production only holds for the partial change from the direct forces, when separating all direct forces other than entropy into the constraints, and when ignoring changes in the frame of reference associated with the inertial forces.
Does it make sense to follow this particular partition of forces into components? There is no correct answer to that question. The principle exists. The interpretations of usefulness and meaning will always have a strongly subjective aspect.
I follow Lanczos (1986) in the claim that separating direct, inertial, and constraining components is the great unifying perspective in the study of forces. In many systems, it makes sense to describe most of the applied forces in terms of the constraining forces of conserved system quantities. Often, all that remains is the only truly universal force, the increase of entropy, which completes the description of the total direct forces acting on a system.
In some cases, it may make sense to use a different partition of applied forces into direct and constraining component forces. When the remaining direct component of force differs from entropy alone, then it would appear that the system does not follow the maximum entropy production principle. However, it is better to say that the maximum entropy production principle always holds, but alternative expressions may provide a more meaningful perspective for particular problems.
In this interpretation, entropy is simply a geometric description of position and change for probability distributions when located in logarithmic coordinates. That fundamental geometry explains the universality of entropy, or information, in widely different disciplines and applications.

| GEOMETRY AND THE FISHER INFORMATION METRIC
We can write the conservation of total probability expression in Equation (15) for small changes, q, as in which  = Σq 2 i ∕q i is the Fisher information metric, and the subscripts on  denote the direct and inertial components of the Price equation.
In various models of natural selection, information, and entropy, different measures arise in terms of the Jeffreys divergence,  , the Kullback-Leibler divergence, , and the Fisher information metric, .
Confusion sometimes occurs, because in the limit of small changes, all three measures converge to an equivalent form that often appears as the Fisher information metric. That limiting equivalence hides the significant differences between the measures and the different situations to which each measure naturally applies.
The Fisher information metric is used in many applications (Cover & Thomas, 1991;Kullback, 1959). For example, Frieden (2004) has emphasized that this Fisher information partition subsumes nearly all of the key results of theoretical physics. Similarly, the subject of information geometry subsumes nearly all of the classical aspects of statistical inference through a Riemannian geometry based on the Fisher information metric (Amari & Nagaoka, 2000).
From the general perspective of the Price equation and d'Alembert's form for the conservation of total probability in Equation (7), the partition into Fisher information components arises as a special case in the limit of small changes (Frank, 2015). In that special case of Fisher information, in which q ⋅ F =  F , one does not separate the forces of constraint from the other directly applied forces. Instead, all directly applied and constraining forces combine into a single quantity that describes the path, in which that path has a natural geometric expression in terms of the Fisher information metric. That geometry is very useful in many applications. But it is important to recognize the more general perspective of Price and d'Alembert, which allows a deeper conceptual understanding of the different roles played by directly applied forces, constraining forces, and inertial forces.
One can think of the maximum entropy production principle in terms of Fisher information geometry. The universal direct force that increases entropy is always present. In addition to that universal direct force, various additional constraining forces combine to influence the curvature of the space of allowable virtual displacements. The direct and constraining forces combine to determine the paths of change within the Fisher information geometry (Amari & Nagaoka, 2000). The work of the direct forces describes change in the context of the fixed frame of reference given by the initial population. The total change depends on how the frame of reference changes, captured by the second term q′ · Δa = Δq · I, as in Equation (11).

| DIRECT WORK, INFORMATION, AND ENTROPY
Often, it is difficult to interpret the changing frame of reference in a simple way. Instead, the strongest universal principles come from study of the work of the direct forces-the partial change caused by the direct forces with respect to the fixed initial frame of reference.
The work of the direct forces may be partitioned into components of directly applied forces, F, and constraining forces expressed by the allowable displacements, Δq. One can make that partition in a variety of ways according to the interpretation of a particular system. The emphasis on forces helps greatly in understanding the causes of change (Lanczos, 1986).
Fitnesses, w i = q � i ∕q i , are ratios of probabilities. Geometrically, it is convenient to have identical ratios correspond to identical distances between coordinates of probability. We achieve that identity by expressing fitness in logarithmic coordinates When we interpret fitness as a force, the logarithmic coordinates change the multiplication of fitness components of force into the addition of the logarithmic fitness components of force, as in Equation (18). In the Price equation, we can use any arbitrary coordinates, z, for the quantitative property values associated with probabilities. We can think of those arbitrary coordinates as a geometric transformation of the fundamental coordinates of conserved probability and fitness, w ↦ z. Equivalently, we may write a ↦ z, because a = w − 1, and the In these Price equation descriptions of change, we have taken the fitnesses as given, and equated fitness or the logarithm of fitness with a notion of force. That approach is essentially inductive, in which we take the probabilities as given locations, w i = q � i ∕q i , and implicitly induce the force that would be consistent with the change from q i to q ′ i .

| PARTIAL MAXIMUM ENTROPY PRODUCTION
The main point of this article is to analyze the traditional deductive perspective of dynamics with respect to force. In that traditional perspective, we begin with the initial location of the population, q, and given forces which we denote F ≡̂ . From those given conditions, we then deduce the changes in location and the new probabilities, q′. I confined the analysis to the study of small changes, q.
To obtain the dynamics, q, from the initial location and the given forces, I first wrote the Lagrangian expression for each particular case.
The Lagrangian focuses on a first term, often called the action, which is either maximized or minimized (extremized). When minimized, the procedure follows the principle of least action, but more generally, the procedure is known as the principle of extreme action.
In this article, I maximized the virtual work of the given direct forces, q ⋅ F =q ⋅̂ . Intuitively, this simply means that the changes will follow the lines of force in relation to the magnitudes of the force in each dimension. However, we must consider both the direct and constraining force.
The Lagrangian approach provides a natural way to combine direct and constraining forces. In each Lagrangian, the first term gives the virtual work of the direct forces to be maximized. The remaining terms give the constraints that must be satisfied, usually as some total Δā = Δq ⋅ a + q � ⋅ Δa = 0.
quantity that is conserved when summed over all dimensions of the system. The Lagrangian procedure transforms the system constraints into the constraining force components in each dimension.
The various results in the text show how different kinds of constraints and different ways of separating overall force into direct and constraining components determine the change in frequencies.
The key result concerns the partial maximum entropy production principle, which I briefly review. I expressed the given forces as ̂ = logq∕q. Thus, the virtual work of the given forces in Equation (27) is I assumed that there is some quantity, z, such as energy or biomass or any other appropriate measure, that is constrained so that the total direct changes in that quantity are q ⋅ z = B. We may relabel the part of the given forces, log q, as a constraining force associated with the fixed value imposed on direct changes in z, given by the expression in Equation (29) as With this component labeled as a constraining force, the remaining part of the virtual work of the direct forces is −q ⋅ log q, which in the limit for small changes is the production of entropy along the path of small changes, q. This component is the action term maximized along the path of change; thus, the path follows the direction that maximizes the production of entropy. I call this the partial maximum entropy production principle, because the result expresses the change in terms of the fixed frame of reference of the initial population state.
Total change must also evaluate any changes in the frame of reference through the inertial forces.
The entropy production principle simply expresses the basic geometry for the path of change when extrinsic forces are considered as constraints on system quantities, and logarithmic coordinates are used to locate populations. Because changes in probabilities as fitness or force have a natural expression as the ratio of probabilities, w i = q � i ∕q i , and such quantities combine multiplicatively, logarithmic coordinates arise naturally from the transformation that yields additive combinations. Thus, entropy production or changes in information arise as the inevitable consequence of the geometry of change when evaluated in the Price equation partition of direct and inertial forces.
In summary, several different disciplines share the same basic fundamental theory of change. From the perspective of the Price equation, we have seen common expressions for natural selection, aspects of physical mechanics and thermodynamics, entropy expressions for probability distributions, and common measures of information theory.

NATURAL SELECTION
Price originally formulated his equation as an expression of natural selection (Price, 1970(Price, , 1972a. In another article, without any direct connection to the Price equation, he speculated about a unified theory of change based on an abstract generalization of the principle of selection (Price, 1995). In other work (Price, 1972b), Price clarified one of the great puzzles in the history of evolutionary theory. In 1930, Fisher stated his fundamental theorem of natural selection as: "The rate of increase in fitness of any organism at any time is equal to its genetic variance in fitness at that time." Fisher emphasized the exactness of the theorem and his belief that the theorem was a general and profound statement about natural selection. The puzzle is that Fisher's theorem holds exactly only under a very restricted set of assumptions (Crow & Kimura, 1970). Fisher is regarded as perhaps the greatest mathematical biologist ever. So the mismatch between Fisher's strong claim and the seemingly obvious failure of the theorem was hard to reconcile. Price (1972b) solved the puzzle. In the language of the present article, Fisher meant that the rate of increase in fitness equals the variance in fitness when evaluated with respect to the fixed frame of reference of the population's initial state. Selection acts as a direct force, with consequences of the direct force evaluated by holding constant the context. Any changes to the population that alter the fitnesses of individuals are regarded as consequences of inertial forces that alter the frame of reference. Price (1972b) did not use the language of direct and inertial forces, but he clearly understood Fisher's partition of total change into two components. Later work clarified a variety of early theories about natural selection within the context of the Fisher's partition (Ewens, 1989(Ewens, , 1992Frank & Slatkin, 1992).
In summary, Price left three separate insights about natural selection: the Price equation, the separation of frequency and property in an abstract mapping scheme, and Fisher's method of partitioning total change with respect to the frame of reference. My own work has unified those different pieces into an extended, more general and abstract interpretation of the Price equation (Frank, 1995(Frank, , 1997(Frank, , 2012a.
Another important line of work in evolutionary theory concerns the path of change in gene frequencies. Wright (1931Wright ( , 1932 initiated the approach most closely related to analogies with classical mechanics. That line of work continues to be developed, including explicit connections to notions of entropy and statistical mechanics (de Vladar & Barton, 2011).
The studies initiated by Wright contrast with Fisher's approach (Frank, 2012c). In the language of this article, Fisher emphasized instantaneous change at a point and the partition of direct and inertial components of change. Fisher believed that the inertial components of change were too unpredictable to allow an explicit theory for the full path of change over significant lengths. By contrast, Wright and his descendants sought a theory of the paths of change over significant distances. This article emphasized the Fisherian perspective.

MAXIMUM ENTROPY PRODUCTION
Jaynes' theory of maximum entropy (Jaynes, 1957a(Jaynes, ,b, 2003 emphasizes that probability distributions can be read as expressions of constraining forces (Frank, 2014).
For example, a Gaussian distribution expresses a constraint on the average distance of observations from the mean value. If one constrains that average distance of fluctuations from the mean, then the Gaussian distribution arises by maximizing the entropy subject to that constraint. Maximizing entropy is roughly equivalent to minimizing information or maximizing randomness. Jaynes' maximum entropy describes an equilibrium condition (Jaynes, 1957a(Jaynes, ,b, 2003. The idea is that entropy increase is a ubiquitous force-a ubiquitous entropic force. Increasing entropy plus constraining forces together define the form of the equilibrium distribution.
The increase in entropy toward an equilibrium leaves open the problem of the dynamical path followed from initial condition to final equilibrium state. What characterizes the increments along that path?
One possibility is that each increment follows the direction that maximizes the increase in entropy-the path of maximum entropy production (MEP).
Some authors have proposed MEP as a fundamental principle similar to the principle of least action (Dewar, 2005;Dewar, Lineweaver, Niven, & Regenauer-Lieb, 2014). By that view, essentially all realized paths of motion maximize the production of entropy. Other authors have suggested that MEP is only an approximate description of dynamics (Dewar et al., 2014). By that view, certain special systems follow MEP exactly, whereas many other systems follow MEP approximately or not at all. Although MEP is a valid principle, I suggested that a purely geometric interpretation provides a more fundamental and universal perspective than does the entropy perspective of MEP. In particular, the conservation of total probability imposes strong geometric symmetry and constraint on the separation of direct and inertial forces (Frank, 2015). Maximum entropy production is a useful but often unnecessarily complicated way of expressing those fundamental geometric principles.
Returning to Jaynes, his goal was to express an abstract and general approach to understanding probability patterns. He sought to transcend the specific physical assumptions of statistical mechanics and thermodynamics, thereby achieving a more general theory that applied to broader range of disciplines.
In several ways, Jaynes did not go far enough. For example, he retained entropy and information as primary quantities. Similarly, information geometry, based on metrics such as Fisher information, retains a notion of information as primary. In my view, the underlying geometry, conserved quantities, and symmetries provide the true foundation for analysis as, for example, in Frank (2016).

STATISTICAL INFERENCE AND LEARNING ALGORITHMS
This article showed that natural selection connects to universal expressions of population change and probability through the Price equation (Frank, 1995(Frank, , 2012aPrice, 1970Price, , 1972a. One can think of natural selection as an algorithm for accumulating information. Many authors have noted formal connections between natural selection, information theory (Frank, 2009(Frank, , 2012b; Bayesian updating in statistical inference (Campbell, 2016;Harper, 2011;Shalizi, 2009); and learning algorithms (Campbell, 1974).
Although initial connections have been made between natural selection and those different subjects, unification based on a deeper geometric foundation remains an open problem. For example, Jaynes maximum entropy approach ultimately aimed to unify probability, information, statistical inference, and physical theories of statistical mechanics and thermodynamics (Jaynes, 2003). Another subject which might eventually coalesce is reinforcement learning (Sutton & Barto, 1998;Szepesvri, 2010) which provides the basis for aspects of neuroscience, cognitive science, and machine learning.
How do those various subjects relate to general underlying geometric principles for the dynamics of change in populations?