schema can show what two configurations have in common
one schema can represent more individuals
it’s a string over alphabet {0, 1, *}
order of schema = number of fixed values (the rest are metasymbols *)
lenght of schema = the length from the first fixed value to the last fixed value
if all solutions have common parts
it’s a good idea to keep them
it’s a part of the state space (we are narrowing it by fixing some values)
Schema Theorem
m(S,t+1)≥m(S,t)∗f(S,t)∗(1−p)
m(S,t) - number of individuals with the schema S in time t
f(S,t) - a relative fitness of the schema S (relative to the whole population)
p - disruption probability, (1−p) is the survival probability
it basically says that schemata with above-average fitness and high survival probability grow in the population, whereas schemata with below-average fitness and lower survival probability shrink
because, if it has lower probability of survival, the right side of the equation is smaller, so the m(S,t+1) could be smaller as well
this Theorem is the foundation of the Building Block Hypothesis
GA works by finding and combining good schemas (“building blocks”) that lead to optimal solutions
short (low schema length), low-order and highly fit schemata survive better, are recombined together to form sustainable and good-fit longer schemata, which then form optimal solutions
the order of “good” schemata gradually increase, until complete solutions (with no wildcards) are presented
at the beginning I have a lot of small schematas (they are surviving easily)
at the end there is either a block with all values specified (a “full schema”) or several blocks with almost all values specified (near-optimal solutions)
these schematas were able to survive and are the most fittest
Schema disruption problems
if the schema is short (it’s length), it will more likely survive crossover and mutation
the longer it is, the more possibilities for cutting it by e.g. one-point crossover are there
I have a schema (a good building block) and it could be destroyed by one-point crossover (each part goes to different child)
Linkage problem
= the chance of survival does not depend only on the schema order, but on it’s length
schema having order = 4 and length = 4 will survive more likely than schema having order = 4 and length = 6
the 2-point crossover is less disruptive than 1-point crossover
mathematically, there is a lower chance of schema disruption (one cut in the schema and one cut outside of schema OR both cuts in the schema)
this works mainly for compact schemas (which are the best performing in GAs)
but more-point crossover or uniform crossover are more disruptive than 1-point crossover
the solutions?
reordering and grouping common genes together
working with building block explicitly →Fast Messy GA
more (fixed) positions can be mutated and the chance of schema disruption is higher
if the fittness of the schema is high
the schema will more likely survive crossover and mutation
Fast Messy GA
one of the approaches to GA
the position information is not used in the algorithm (just for decoding)
the only useful information is the value of the gene itself
gene is a pair (position, value)
FMGA uses only the value, position is only for decoding
same in the Knapsack (I don’t care which items are first or last)
this specification can lead to underspecified and overspecified individuals
underspecified - not having all the genes
((0, 1) (2, 0)) → where is gene on position 1?
solution: reference individual (below) is used (only the value on position 1 is “borrowed”)
overspecified - having multiple values on one position
((0, 1) (2, 0) (1, 1) (2, 1)) → two values on position 2
solution: first occurence is used
reference individual
there is only one for the whole generation
it is always fully specified (no asterisks), so the fitness could be always calculated
is gradually better and better, it is the best individual constructed from the population
used to fill the blanks in underspecified individuals
how does it work?
the principle is similar to generic GA
at first, random strings are generated (mainly small, low-order and short) representing promising schemata
then selection and thresholding of strings to obtain a given order
using relative fitness
thresholding/shortening: removing genes, which do not contribute to fitness (to maintain short and compact building blocks and eliminate bloat)
crossover operation (cut & splice)
both parents are cutted at one point at random
the 4 parts are then combined (so 12 new children are created (4*3 options))
children can have variable lenghts
gradually building high-order, fitter blocks from short schemata and this way, we obtain good building blocks
why is it better than standard GA?
more resistence to schema disruption (the position information travels with the gene value) and genes that work together can stay together
the children can be of variable lenghts (more flexibility, more complexity)
FMGA identifies the good building blocks better
why is it worse than standard GA?
it’s really complex
computational complexity
complex operator implementation
a lot of parameters, difficult tuning
today, the standard GA with some improvements is used more often
FMGA is rarely used, sometimes in research, but the practical applicability is not big
Bayesian Optimization Algorithm (BOA)
instead of explicitly representing individuals, a probabilistic model, which describes the population is created
the idea
similar to standard GA
GA tries to get better solutions by improving the individuals directly
in BOA we have model representing the population (it’s properties) of promising solutions
the model is being refined (improved) to better capture what makes a solution good
and the model samples/generates better and better individuals
operations of crossover and mutation are replaced with probabilistic modelling
Principle
BOA:Generation t:Population → Selection → Build Model → Sample from Model → New Population (keep good) (learn structure) (generate)
the selection filters out the good/fit individuals and then a probabilistic model is created to capture the nature, properties and relationship between the fit individuals
e.g. if on position 1 is 1, then on position 2 is 0 (this property is in 90 % of fit individuals - it is probably a good property)
it tries to “explain” the observations in the individuals
before sampling from the model, the created model is optimized in a separate loop
adding/removing/reverting edges and other operators are present
the cost metric is “how precise is the model when describing the current population?”
could be optimized using greedy algorithms, Simulated Annealing etc.
Bayesian network B
in general
it represents the structure of the problem
it describes the data (using conditional probabilities)
is used to generate new data of similar properties
are represented in DAG (directed acyclic graph)
in BOA
it represents probabilistic dependencies between two genes
is able to describe the statistical properties of selected individuals and generate new individuals with similar or better properties
Application
the population is described using a Bayesian network, which is a DAG describing the selected population properties, dependencies and interactions
conditional probability is quantify these relationships
Genetic programming (GP)
the idea of genetic programming was originally to develop/evolve a computer program doing user-defined tasks (programmers won’t be needed “in the future”)
now it is used to develop/evolve digital and analog circuits, antennas, math formulas etc.
the results show, that it is able to develop weird, extraordinary/unconventional/creative solutions which perform better than designs from humans
how does it work?
the input is a high-level program/circuit/antenna description
terminals, non-terminals (functions), constraints, fitness function, termination function
the output is a finished computer program/circuit/antenna
programs are internally represented using trees
internal nodes = functions/operators
leaf nodes = terminals (variables, constants)
operations:
crossover
randomly select parent nodes and exchange them (along with their subtrees)
the result is a syntactically valid executable program (proceeds in the next generation)
mutation
randomly change: function, terminal
replace subtrees with terminals, delete subtrees…
reproduction
probabilistically select an individual based on fitness and copy it into new generation (basically elitism)