Definition

What are genetic algorithms? How are they used?
Where can they be applied?
What makes them a useful tool for engineers?


What are Genetic Algorithms?

Genetic Algorithms are nondeterministic stochastic search/optimization methods that utilize the theories of evolution and natural selection to solve a problem within a complex solution space.

They are computer-based problem solving systems which use computational models of some of the known mechanisms in evolution as key elements in their design and implementation. They are a member of a wider population of algorithm, Evolutionary Algorithms(EA). The major classes of EAs are: GENETIC ALGORITHMs, EVOLUTIONARY PROGRAMMING, EVOLUTION STRATEGIEs, CLASSIFIER SYSTEM, and GENETIC PROGRAMMING. They all share a common conceptual base of simulating the evolution of individual structures via processes of selection, mutation, and reproduction. The processes depend on the perceived performance of the individual structures as defined by an environment.

EAs maintain a population of structures, that evolve according to rules of selection and other operators, that are referred to as "search operators", (or genetic operators), such as recombination and mutation. Each individual in the population receives a measure of it's fitness in the environment. Reproduction focuses attention on high fitness individuals, thus exploiting the available fitness information. Recombination and mutation perturb those individuals, providing general heuristics for exploration. Although simplistic from a biologist's viewpoint, these algorithms are sufficiently complex to provide robust and powerful adaptive search mechanisms.

GAs differ from conventional optimization/search procedures in that:
1. They work with a coding of the parameter set, not the parameters themselves.
2. They search from a population of points in the problem domain, not a singular point.
3. They use a payoff information as the objective function rather than derivatives of the problem or auxiliary knowledge.
4. They utilize probabilistic transition rules based on fitness rather than deterministic one.


The Basics of Genetic Algorithms


Other Related areas include:


How and where are genetic algorithms applied?

Why a Genetic Algorithm?

Advances in computer technology have made molecular dynamics simulations more and more popular in studying the behavior of complex systems. Even with modern-day computers, however, there are still two main limitations facing atomistic simulations: system size and simulation time. While recent developments in parallel computer design and algorithms have made considerable progress in enlarging the system size that can be accessed using atomistic simulations, methods for shortening the simulation time still remain relatively unexplored.

One example where such methods will be useful is in the determination of the lowest energy configurations of a collection of atoms. Because the number of candidate local energy minima grows exponentially with the number of atoms, the computational effort scales exponentially with problem size, making it a member of the NP-hard problem class.

For a few atoms, the ground state can sometimes be found by a brute force search of configuration space. For up to ten or twenty atoms, depending upon the potential, simulated annealing may be employed to generate some candidate ground state configurations. For more atoms than this, attempts to use simulated annealing to find the global energy minimum are frustrated by high energy barriers which trap the simulation in one of the numerous metastable configurations.

An algorithm is needed which can `hop' from one minimum to another and permit an efficient sampling of phase space. Our approach is based on the genetic algorithm (GA), an optimization strategy inspired by the Darwinian evolution process. Starting with a population of candidate structures, we relax these candidates to the nearest local minimum. Using the relaxed energies as the criteria of fitness, a fraction of the population is selected as ``parents.'' The next generation of candidate structures is produced by ``mating'' these parents. The process is repeated until the ground state structure is located.


NP Completeness

Remember, we are considering only decision problems, which are those for which every instance is either a "yes" instance or a "no" instance. We may equate the problem itself with the set of "yes" instances of that problem. For example, three processor scheduling instances are triplets (A,l,D) where A is a set of tasks, l(a) is the time required for each task a, and deadline D. The 3ps problem is the set of triplets for which some assignment of the tasks in A all meet the deadline D.

A problem A reduces to a problem B, denoted A <= B, when there is a polynomial time function f such that x is in A if and only if f(x) is in B. This says that "A reduces to B" when there is an efficient way to transform instances of A into instances of B such that "yes" instances map to "yes" instances, and "no" instances to "no" instances.

A problem B is NP hard when A <= B for every A in NP, and a problem is NP complete when it is both NP hard and in NP.

Proving NP Completeness

To show that a problem A is NP complete, you need to

1.Show that A is in NP (by giving a polynomial time algorithm for verifying it, given a small "hint" string) 2.Show that B reduces to A for some NP complete problem

This works since, if B is NP complete, then any problem in NP reduces to B. But it then reduces to your problem as well, by the reduction from B that you provide.

Naturally, you need an NP complete problem for this to work. Several are available, such as those in earlier discussion on NP.

I will prove that the mPS is NP complete by reducing partition to 2PS and then reducing 2PS to mPS. These problems are:

Partition Instance: a multiset A and a measurement of the size of element in A s:A-->N; Question: can A be partitioned into subsets A0 and A1 such that the sum over elements a in A0 of s(a) is equal to the sum over element a in A1 of s(a)? (That is: can A be partitioned into equally-sized subsets?) Multi-processor scheduling (mPS) Instance: A multiset A of tasks, a measurement of the time required for each task l: A-->N, a deadline (real number) D; Question: is there a partition of A into m disjoint sets such that the total time (sum of l(a)) for every element in a partition is always at most D? Two Processor Scheduling (2PS) Same as mPS, with m=2.

Partition is NP complete, as shown by the sequence of reductions: SAT <= 3SAT <= 3DM <= Partition. I will not show these reductions here (or define these problems).

Theorem: 2PS is NP complete Proof: The problem is clearly in NP, since it is easy to verify that the total time for the tasks in each of two partitions is at most D, given the partitions as "hints".

Now, we show that Partition <= 2PS in order to conclude that the problem is NP complete.

Let (A,s) be an instance of partition. Let D be half the sum of s(a) over all a's in a. Then (A,s,D) is an instance of 2PS.

Now, if (A,s) is a "yes" instance of partition, then let A0 and A1 be the partition such that the sum of s(a)'s in each are the same. For these same partitions A0 and A1, their sum is less than or equal to (actually, equal to) D for this D. So, (A,s,D) is a "yes" instance of 2PS.

Conversely, if (A,s,D) is a "yes" instance of 2PS, then let A0, A1 be the schedules of tasks on processors 0 and 1. We have that the sum of the times s(a) over all a in A0 is at most D, and the same is true for the sums over A1. So the sum of these two sums is at most 2D which is the sum of all s(a)'s over all a in A. But this cannot exceed the sum of all values, so 2D is exactly the sum of all weights in A, which implies that the sum of weights in both A0 and A1 is exactly half of D. Consequently, these two sums are equal and so (A,s) is a "yes" instance of partition.

In other words, partition reduces to 2PS, and so 2PS is NP complete.

Now we show that mPS is NP complete, by reducing 2PS to it.

Theorem: mPS is NP complete Proof:

That mPS is in NP is obvious, by a similar argument to the above one: the "hint" is a schedule, the verification is just adding up times on each processor and comparing to D.

Now, we reduce an arbitrary instance of 2PS, (A,s,D) to an instance of mPS.

For instance (A,s,D), form an m-processor instance (A',s',D) by letting A' = A union {v} union ... union {v}, where:

m-2 copies of {v} are added in the unions v is not in A s'(v) = D, and s'(x) = s(x) for all x in A

Now, if (A,s,D) is a "yes" instance, then the same partition of A that works in the two processor case will work in the m processor case, with the other m-2 processors being assigned {v}, with a cost of at most D for all processors.

Conversely, any schedule for the m-processor case can be transformed into one for the two processor case. Take any such schedule A1, ... Am. None of these sets can contain both v and x, for any non-v x, since the weight would then be s'(v) + s'(x) = D + s(x) > D. Furthermore, exactly m-2 of these must contain a {v}, since there are m-2 copies of v. So, let Ai and Aj be the two remaining sets (m-(m-2)) which contain only non-v elements. By construction, these are precisely the elements in A, and so they form a "yes" instance to the 2PS instance (A,s,D).

Consequences of NP completeness

Here is a major consequences of NP completeness. Suppose that A is an NP complete problem (such as three processor scheduling). Then

If A is in P, then P=NP. That is, if there is a feasible algorithm to solve A, then there are feasible algorithms for every single NP complete problem.

Since no one has been able to find a fast algorithm for any NP complete problem, it is unlikely that all those problems had fast solutions and all those smart people just missed them. So, we assume that no NP complete problem is feasible.

Note, this is an article of faith. We do not know that P is different from NP, but it does seem to be the case.


Genetic Programming Definitions

Automatically Defined Function : a function (subroutine, procedure, module) that is dynamically evolved during a run of genetic programming and which may be invoked by a calling program (main program) that is simultaneously being evolved.

Genetic Programming (Koza's Definition) : the domain-independent genetic programming paradigm capable of evolving computer program that solve or approxiately solve a variety of problems from a variety of field.

Terminal Set : the initially defined set of parameters(variables or inputs) to be used within a run of GP in attempts to evolve a yet-to-be-discovered computer program.

Function Set : the initially defined set of functions (arithmetic, logic, comparison, standard programming operations, or domain-specific) to be used within a run of genetic programming.

Sufficiency Requirement : the requirement that the terminal and function sets defined for a particular problem, together, must be capable of expressing a solution to the problem being addressed.

Closure Requirement : the requirement that every function defined in the function set of a problem must be able to accept, as its arguments, any vlaue that may possible be returned by any function in the function set and any value that may possibly be assumed by any terminal in the terminal set.

Structural Complexity : measures the size of a program. It is a count of the number of times that the functions from the function set and the terminals from the terminal set appear in a program.

Computational Effort : the computational effort, E, required to yield a solution or satisfactory result to a problem with a satisfactorily high probability, z, in the fixed population size, M, and a particular fixed maximum number, i, of generations, G ( SUM G: 1..i), is based on the number of fitness evaluations performed on an individual, I (for GP: individual="chromosome"=program) that must be executed to yield a qualified solution for the problem in the ith generation. Stated as: E = I(M,i,z)

Fitness Measure : evaluates how well each individual computer program in the population performs in the problem environment.

Termination Criteria : is the method os result designation for deciding when to terminate the run.


Definitions and Terms

1

1/5 SUCCESS RULE: Derived by I. Rechenberg, the suggestion that when Gaussian MUTATIONs are applied to real-valued vectors in searching for the minimum of a function, a rule-of-thumb to attain good rates of error convergence is to adapt the STANDARD DEVIATION of mutations to generate one superior solution out of every five attempts.

A

ADAPTIVE BEHAVIOUR: "...underlying mechanisms that allow animals, and potentially, ROBOTs to adapt and survive in uncertain environments" --- Meyer & Wilson (1991), [SAB90]

AI: See ARTIFICIAL INTELLIGENCE.

ALIFE: See ARTIFICIAL LIFE.

ALLELE : (biol) Each GENE is able to occupy only a particular region of a CHROMOSOME, it's locus. At any given locus there may exist, in the POPULATION, alternative forms of the gene. These alternative are called alleles of one another. (EC) The value of a gene. Hence, for a binary representation, each gene may have an ALLELE of 0 or 1.

ARTIFICIAL INTELLIGENCE: "...the study of how to make computers do things at which, at the moment, people are better" --- Elaine Rich (1988)

ARTIFICIAL LIFE: Term coined by Christopher G. Langton for his 1987 [ALIFEI] conference. In the preface of the proceedings he defines ALIFE as "...the study of simple computer generated hypothetical life forms, i.e. life-as-it-could-be."

B

BUILDING BLOCK: (EC) A small, tightly clustered group of GENEs which have co- evolved in such a way that their introduction into any CHROMOSOME will be likely to give increased FITNESS to that chromosome.

C

CENTRAL DOGMA: (biol) The dogma that nucleic acids act as templates for the synthesis of proteins, but never the reverse. More generally, the dogma that GENEs exert an influence over the form of a body, but the form of a body is never translated back into genetic code: acquired characteristics are not inherited. cf LAMARCKISM.

CFS: See CLASSIFIER SYSTEM.

CHROMOSOME: (biol) One of the chains of DNA found in cells. CHROMOSOMEs contain GENEs, each encoded as a subsection of the DNA chain. Chromosomes are usually present in all cells in an organism, even though only a minority of them will be active in any one cell. (EC) A datastructure which holds a `string' of task parameters, or genes. This may be stored, for example, as a binary bit- string, or an array of integers.

CLASSIFIER SYSTEM: A system which takes a (set of) inputs, and produces a (set of) outputs which indicate some classification of the inputs. An example might take inputs from sensors in a chemical plant, and classify them in terms of: 'running ok', 'needs more water', 'needs less water', 'emergency'. See Q1.4 for more information.

COMBINATORIAL OPTIMIZATION: Some tasks involve combining a set of entities in a specific way (e.g. the task of building a house). A general combinatorial task involves deciding (a) the specifications of those entities (e.g. what size, shape, material to make the bricks from), and (b) the way in which those entities are brought together (e.g. the number of bricks, and their relative positions). If the resulting combination of entities can in some way be given a FITNESS score, then COMBINATORIAL OPTIMIZATION is the task of designing a set of entities, and deciding how they must be configured, so as to give maximum fitness. cf ORDER-BASED PROBLEM.

COMMA STRATEGY: Notation originally proposed in EVOLUTION STRATEGIEs, when a POPULATION of "mu" PARENTs generates "lambda" OFFSPRING and the mu parents are discarded, leving only the lambda INDIVIDUALs to compete directly. Such a process is written as a (mu,lambda) search. The process of only competing offspring then is a "comma strategy." cf. PLUS STRATEGY.

CONVERGED: A GENE is said to have CONVERGED when 95% of the CHROMOSOMEs in the POPULATION all contain the same ALLELE for that gene. In some circumstances, a population can be said to have converged when all genes have converged. (However, this is not true of populations containing multiple SPECIES, for example.) Most people use "convergence" fairly loosely, to mean "the GA has stopped finding new, better solutions". Of course, if you wait long enough, the GA will *eventually* find a better solution (unless you have already found the global optimum). What people really mean is "I'm not willing to wait for the GA to find a new, better solution, because I've already waited longer than I wanted to and it hasn't improved in ages."

CONVERGENCE VELOCITY: The rate of error reduction.

COOPERATION: The behavior of two or more INDIVIDUALs acting to increase the gains of all participating individuals.

CROSSOVER: (EC) A REPRODUCTION OPERATOR which forms a new CHROMOSOME by combining parts of each of two `parent' chromosomes. The simplest form is single-point CROSSOVER, in which an arbitrary point in the chromosome is picked. All the information from PARENT A is copied from the start up to the crossover point, then all the information from parent B is copied from the crossover point to the end of the chromosome. The new chromosome thus gets the head of one parent's chromosome combined with the tail of the other. Variations exist which use more than one crossover point, or combine information from parents in other ways.

CS: See CLASSIFIER SYSTEM.

D

DARWINISM: (biol) Theory of EVOLUTION, proposed by Darwin, that evolution comes about through random variation of heritable characteristics, coupled with natural SELECTION (survival of the fittest). A physical mechanism for this, in terms of GENEs and CHROMOSOMEs, was discovered many years later. DARWINISM was combined with the selectionism of Weismann and the genetics of Mendel to form the Neo-Darwinian Synthesis during the 1930s-1950s by T. Dobzhansky, E. Mayr, G. Simpson, R. Fisher, S. Wright, and others. cf LAMARCKISM.

DECEPTION: The condition where the combination of good BUILDING BLOCKs leads to reduced FITNESS, rather than increased fitness. Proposed by [GOLD89] as a reason for the failure of GAs on many tasks.

DIPLOID: (biol) This refers to a cell which contains two copies of each CHROMOSOME. The copies are homologous i.e. they contain the same GENEs in the same sequence. In many sexually reproducing SPECIES, the genes in one of the sets of chromosomes will have been inherited from the father's GAMETE (sperm), while the genes in the other set of chromosomes are from the mother's gamete (ovum).

DNA: (biol) Deoxyribonucleic Acid, a double stranded macromolecule of helical structure (comparable to a spiral staircase). Both single strands are linear, unbranched nucleic acid molecules build up from alternating deoxyribose (sugar) and phosphate molecules. Each deoxyribose part is coupled to a nucleotide base, which is responsible for establishing the connection to the other strand of the DNA. The 4 nucleotide bases Adenine (A), Thymine (T), Cytosine (C) and Guanine (G) are the alphabet of the genetic information. The sequences of these bases in the DNA molecule determines the building plan of any organism. [eds note: suggested reading: James D. Watson (1968) "The Double Helix", London: Weidenfeld and Nicholson]

DNS: (biol) Desoxyribonukleinsaeure, German for DNA.

E

EC: See EVOLUTIONARY COMPUTATION.

ELITISM: ELITISM (or an elitist strategy) is a mechanism which is employed in some EAs which ensures that the CHROMOSOMEs of the most highly fit member(s) of the POPULATION are passed on to the next GENERATION without being altered by GENETIC OPERATORs. Using elitism ensures that the mamimum FITNESS of the population can never reduce from one generation to the next. Elitism usually brings about a more rapid convergence of the population. In some applications elitism improves the chances of locating an optimal INDIVIDUAL, while in others it reduces it.

ENCORE: The EvolutioNary Computation REpository Network. An collection of FTP servers/World Wide Web sites holding all manner of interesting things related to EC. See Q15.3 for more information.

ENVIRONMENT: (biol) That which surrounds an organism. Can be 'physical' (abiotic), or biotic. In both, the organism occupies a NICHE which influences its FITNESS within the total ENVIRONMENT. A biotic environment may present frequency-dependent fitness functions within a POPULATION, that is, the fitness of an organism's behaviour may depend upon how many others are also doing it. Over several GENERATIONs, biotic environments may foster co-evolution, in which fitness is determined with SELECTION partly by other SPECIES.

EP: See EVOLUTIONARY PROGRAMMING.

EPISTASIS: (biol) A "masking" or "switching" effect among GENEs. A biology textbook says: "A gene is said to be epistatic when its presence suppresses the effect of a gene at another locus. Epistatic genes are sometimes called inhibiting genes because of their effect on other genes which are described as hypostatic."

ES: See EVOLUTION STRATEGY.

EVOLUTION: That process of change which is assured given a reproductive POPULATION in which there are (1) varieties of INDIVIDUALs, with some varieties being (2) heritable, of which some varieties (3) differ in FITNESS (reproductive success). (See the talk.origins FAQ for discussion on this (See Q10.7).)

EVOLUTION STRATEGIE:

EVOLUTION STRATEGY: A type of evolutionary algorithm developed in the early 1960s in Germany. It employs real-coded parameters, and in its original form, it relied on MUTATION as the search operator, and a POPULATION size of one. Since then it has evolved to share many features with GENETIC ALGORITHMs. See Q1.3 for more information.

EVOLUTIONARILY STABLE STRATEGY: A strategy that does well in a POPULATION dominated by the same strategy. (cf Maynard Smith, 1974) Or, in other words, "An 'ESS' ... is a strategy such that, if all the members of a population adopt it, no mutant strategy can invade." (Maynard Smith "Evolution and the Theory of Games", 1982).

EVOLUTIONARY COMPUTATION: Encompasses methods of simulating EVOLUTION on a computer. The term is relatively new and represents an effort bring together researchers who have been working in closely related fields but following different paradigms. The field is now seen as including research in GENETIC ALGORITHMs, EVOLUTION STRATEGIEs, EVOLUTIONARY PROGRAMMING, ARTIFICIAL LIFE, and so forth. For a good overview see the editorial introduction to Vol. 1, No. 1 of "Evolutionary Computation" (MIT Press, 1993). That, along with the papers in the issue, should give you a good idea of representative research.

EVOLUTIONARY PROGRAMMING: An evolutionay algorithm developed in the mid 1960s. It is a stochastic OPTIMIZATION strategy, which is similar to GENETIC ALGORITHMs, but dispenses with both "genomic" representations and with CROSSOVER as a REPRODUCTION OPERATOR. See Q1.2 for more information.

EVOLUTIONARY SYSTEMS: A process or system which employs the evolutionary dynamics of REPRODUCTION, MUTATION, competition and SELECTION. The specific forms of these processes are irrelevant to a system being described as "evolutionary."

EXPECTANCY: Or expected value. Pertaining to a random variable X, for a continuous random variable, the expected value is: E(X) = INTEGRAL(-inf, inf) [X f(X) dX]. The discrete expectation takes a similar form using a summation instead of an integral.

EXPLOITATION: When traversing a SEARCH SPACE, EXPLOITATION is the process of using information gathered from previously visited points in the search space to determine which places might be profitable to visit next. An example is hillclimbing, which investigates adjacent points in the search space, and moves in the direction giving the greatest increase in FITNESS. Exploitation techniques are good at finding local maxima.

EXPLORATION: The process of visiting entirely new regions of a SEARCH SPACE, to see if anything promising may be found there. Unlike EXPLOITATION, EXPLORATION involves leaps into the unknown. Problems which have many local maxima can sometimes only be solved by this sort of random search.

F

FITNESS: (biol) Loosely: adaptedness. Often measured as, and sometimes equated to, relative reproductive success. Also proportional to expected time to extinction. "The fit are those who fit their existing ENVIRONMENTs and whose descendants will fit future environments." (J. Thoday, "A Century of Darwin", 1959). Accidents of history are relevant.

FUNCTION OPTIMIZATION: For a function which takes a set of N input parameters, and returns a single output value, F, FUNCTION OPTIMIZATION is the task of finding the set(s) of parameters which produce the maximum (or minimum) value of F. Function OPTIMIZATION is a type of VALUE-BASED PROBLEM.

FTP: File Transfer Protocol. A system which allows the retrieval of files stored on a remote computer. Basic FTP requires a password before access can be gained to the remote computer. Anonymous FTP does not require a special password: after giving "anonymous" as the user name, any password will do (typically, you give your email address at this point). Files available by FTP are specified as <ftp-site-name>:<the-complete-filename> See Q15.5.

FUNCTION SET: (GP) The set of operators used in GP. These functions label the internal (non-leaf) points of the parse trees that represent the programs in the POPULATION. An example FUNCTION SET might be {+, -, *}.

G

GA: See GENETIC ALGORITHM.

GAME THEORY: A mathematical theory originally developed for human games, and generalized to human economics and military strategy, and to EVOLUTION in the theory of EVOLUTIONARILY STABLE STRATEGY. GAME THEORY comes into it's own wherever the optimum policy is not fixed, but depends upon the policy which is statistically most likely to be adopted by opponents.

GAMETE: (biol) Cells which carry genetic information from their PARENTs for the purposes of sexual REPRODUCTION. In animals, male GAMETEs are called sperm, female gametes are called ova. Gametes have a HAPLOID number of CHROMOSOMEs.

GAUSSIAN DISTRIBUTION: See NORMALLY DISTRIBUTED.

GENE: (EC) A subsection of a CHROMOSOME which (usually) encodes the value of a single parameter.

GENE-POOL: The whole set of GENEs in a breeding POPULATION. The metaphor on which the term is based de-emphasizes the undeniable fact that genes actually go about in discrete bodies, and emphasizes the idea of genes flowing about the world like a liquid.

GENERATION: (EC) An iteration of the measurement of FITNESS and the creation of a new POPULATION by means of REPRODUCTION OPERATORs.

GENETIC ALGORITHM: A type of EVOLUTIONARY COMPUTATION devised by John Holland [HOLLAND92]. A model of machine learning that uses a genetic/evolutionary metaphor. Implementations typically use fixed-length character strings to represent their genetic information, together with a POPULATION of INDIVIDUALs which undergo CROSSOVER and MUTATION in order to find interesting regions of the SEARCH SPACE. See Q1.1 for more information.

GENETIC DRIFT: Changes in gene/allele frequencies in a POPULATION over many GENERATIONs, resulting from chance rather than SELECTION. Occurs most rapidly in small populations. Can lead to some ALLELEs becoming `extinct', thus reducing the genetic variability in the population.

GENETIC PROGRAMMING: GENETIC ALGORITHMs applied to programs. GENETIC PROGRAMMING is more expressive than fixed-length character string GAs, though GAs are likely to be more efficient for some classes of problems. See Q1.5 for more information. GENETIC OPERATOR: A search operator acting on a coding structure that is analogous to a GENOTYPE of an organism (e.g. a CHROMOSOME).

GENOTYPE: The genetic composition of an organism: the information contained in the GENOME.

GENOME: The entire collection of GENEs (and hence CHROMOSOMEs) possessed by an organism.

GLOBAL OPTIMIZATION: The process by which a search is made for the extremum (or extrema) of a functional which, in EVOLUTIONARY COMPUTATION, corresponds to the FITNESS or error function that is used to assess the PERFORMANCE of any INDIVIDUAL.

GP: See GENETIC PROGRAMMING.

H

HAPLOID: (biol) This refers to cell which contains a single CHROMOSOME or set of chromosomes, each consisting of a single sequence of GENEs. An example is a GAMETE. cf DIPLOID. In EC, it is usual for INDIVIDUALs to be HAPLOID.

HARD SELECTION: SELECTION acts on competing INDIVIDUALs. When only the best available individuals are retained for generating future progeny, this is termed "hard selection." In contrast, "soft selection" offers a probabilistic mechanism for maitaining individuals to be PARENTs of future progeny despite possessing relatively poorer objective values.

I

INDIVIDUAL: A single member of a POPULATION. In EC, each INDIVIDUAL contains a CHROMOSOME (or, more generally, a GENOME) which represents a possible solution to the task being tackled, i.e. a single point in the SEARCH SPACE. Other information is usually also stored in each individual, e.g. its FITNESS.

INVERSION: (EC) A REORDERING operator which works by selecting two cut points in a CHROMOSOME, and reversing the order of all the GENEs between those two points.

L

LAMARCKISM: Theory of EVOLUTION which preceded Darwin's. Lamarck believed that evolution came about through the inheritance of acquired characteristics. That is, the skills or physical features which an INDIVIDUAL acquires during its lifetime can be passed on to its OFFSPRING. Although Lamarckian inheritance does not take place in nature, the idea has been usefully applied by some in EC. cf DARWINISM.

LCS: See LEARNING CLASSIFIER SYSTEM.

LEARNING CLASSIFIER SYSTEM: A CLASSIFIER SYSTEM which "learns" how to classify its inputs. This often involves "showing" the system many examples of input patterns, and their corresponding correct outputs. See Q1.4 for more information.

M

MIGRATION: The transfer of (the GENEs of) an INDIVIDUAL from one SUB- POPULATION to another.

MOBOT: MOBile ROBOT. cf ROBOT.

MUTATION: (EC) a REPRODUCTION OPERATOR which forms a new CHROMOSOME by making (usually small) alterations to the values of GENEs in a copy of a single, PARENT chromosome.

N

NFL: See NO FREE LUNCH.

NICHE: (biol) In natural ecosystems, there are many different ways in which animals may survive (grazing, hunting, on the ground, in trees, etc.), and each survival strategy is called an "ecological niche." SPECIES which occupy different NICHEs (e.g. one eating plants, the other eating insects) may coexist side by side without competition, in a stable way. But if two species occupying the same niche are brought into the same area, there will be competition, and eventually the weaker of the two species will be made (locally) extinct. Hence diversity of species depends on them occupying a diversity of niches (or on geographical separation).

NO FREE LUNCH: For any pair of search algorithms, there are "as many" problems for which the first algorithm outperforms the second as for which the reverse is true. One consequence of this is that if we don't put any domain knowledge into our algorithm, it is as likely to perform worse than random search as it is likely to perform better. This is true for all algorimths including GENETIC ALGORITHMs.

NORMALLY DISTRIBUTED: A random variable is NORMALLY DISTRIBUTED if its density function is described as f(x) = 1/sqrt(2*pi*sqr(sigma)) * exp(-0.5*(x-mu)*(x- mu)/sqr(sigma)) where mu is the mean of the random variable x and sigma is the STANDARD DEVIATION.

O

OBJECT VARIABLES: Parameters that are directly involved in assessing the relative worth of an INDIVIDUAL.

OFFSPRING: An INDIVIDUAL generated by any process of REPRODUCTION.

OPTIMIZATION: The process of iteratively improving the solution to a problem with respect to a specified objective function.

ORDER-BASED PROBLEM: A problem where the solution must be specified in terms of an arrangement (e.g. a linear ordering) of specific items, e.g. TRAVELLING SALESMAN PROBLEM, computer process scheduling. ORDER-BASED PROBLEMs are a class of COMBINATORIAL OPTIMIZATION problems in which the entities to be combined are already determined. cf VALUE-BASED PROBLEM.

ONTOGENESIS: Refers to a single organism, and means the life span of an organism from it's birth to death. cf PHYLOGENESIS.

P

PANMICTIC POPULATION: (EC, biol) A mixed POPULATION. A population in which any INDIVIDUAL may be mated with any other individual with a probability which depends only on FITNESS. Most conventional evolutionary algorithms have PANMICTIC POPULATIONs. The opposite is a population divided into groups known as SUB- POPULATIONs, where individuals may only mate with others in the same sub-population. cf SPECIATION.

PARENT: An INDIVIDUAL which takes part in REPRODUCTION to generate one or more other individuals, known as OFFSPRING, or children.

PERFORMANCE: cf FITNESS.

PHENOTYPE: The expressed traits of an INDIVIDUAL.

PHYLOGENESIS: Refers to a POPULATION of organisms. The life span of a population of organisms from pre-historic times until today. cf ONTOGENESIS.

PLUS STRATEGY: Notation originally proposed in EVOLUTION STRATEGIEs, when a POPULATION of "mu" PARENTs generates "lambda" OFFSPRING and all mu and lambda INDIVIDUALs compete directly, the process is written as a (mu+lambda) search. The process of competing all parents and offspring then is a "plus strategy." cf. COMMA STRATEGY.

POPULATION: A group of INDIVIDUALs which may interact together, for example by mating, producing OFFSPRING, etc. Typical POPULATION sizes in EC range from 1 (for certain EVOLUTION STRATEGIEs) to many thousands (for GENETIC PROGRAMMING). cf SUB- POPULATION.

R

RECOMBINATION: cf CROSSOVER.

REORDERING: (EC) A REORDERING operator is a REPRODUCTION OPERATOR which changes the order of GENEs in a CHROMOSOME, with the hope of bringing related genes closer together, thereby facilitating the production of BUILDING BLOCKs. cf INVERSION.

REPRODUCTION: (biol, EC) The creation of a new INDIVIDUAL from two PARENTs (sexual REPRODUCTION). Asexual reproduction is the creation of a new individual from a single parent.

REPRODUCTION OPERATOR: (EC) A mechanism which influences the way in which genetic information is passed on from PARENT(s) to OFFSPRING during REPRODUCTION. REPRODUCTION OPERATORs fall into three broad categories: CROSSOVER, MUTATION and REORDERING operators.

REQUISITE VARIETY: In GENETIC ALGORITHMs, when the POPULATION fails to have a "requisite variety" CROSSOVER will no longer be a useful search operator because it will have a propensity to simply regenerate the PARENTs.

ROBOT: "The Encyclopedia Galactica defines a ROBOT as a mechanical apparatus designed to do the work of man. The marketing division of the Sirius Cybernetics Corporation defines a robot as `Your Plastic Pal Who's Fun To Be With'." --- Douglas Adams (1979)

S

SAFIER: An EVOLUTIONARY COMPUTATION FTP Repository, now defunct. Superceeded by ENCORE. SCHEMA: A pattern of GENE values in a CHROMOSOME, which may include `dont care' states. Thus in a binary chromosome, each SCHEMA (plural schemata) can be specified by a string of the same length as the chromosome, with each character one of {0, 1, #}. A particular chromosome is said to `contain' a particular schema if it matches the schema (e.g. chromosome 01101 matches schema #1#0#). The `order' of a schema is the number of non-dont-care positions specified, while the `defining length' is the distance between the furthest two non-dont-care positions. Thus #1##0# is of order 2 and defining length 3.

SCHEMA THEOREM: Theorem devised by Holland [HOLLAND92] to explain the behaviour of GAs. In essence, it says that a GA gives exponentially increasing reproductive trials to above average schemata. Because each CHROMOSOME contains a great many schemata, the rate of SCHEMA processing in the POPULATION is very high, leading to a phenomenon known as implicit parallelism. This gives a GA with a population of size N a speedup by a factor of N cubed, compared to a random search.

SEARCH SPACE: If the solution to a task can be represented by a set of N real- valued parameters, then the job of finding this solution can be thought of as a search in an N-dimensional space. This is referred to simply as the SEARCH SPACE. More generally, if the solution to a task can be represented using a representation scheme, R, then the search space is the set of all possible configurations which may be represented in R.

SEARCH OPERATORS: Processes used to generate new INDIVIDUALs to be evaluated. SEARCH OPERATORS in GENETIC ALGORITHMs are typically based on CROSSOVER and point MUTATION. Search operators in EVOLUTION STRATEGIEs and EVOLUTIONARY PROGRAMMING typically follow from the representation of a solution and often involve Gaussian or lognormal perturbations when applied to real-valued vectors.

SELECTION: The process by which some INDIVIDUALs in a POPULATION are chosen for REPRODUCTION, typically on the basis of favoring individuals with higher FITNESS.

SELF-ADAPTATION: The inclusion of a mechanism not only to evolve the OBJECT VARIABLES of a solution, but simultaneously to evolve information on how each solution will generate new OFFSPRING.

SIMULATION: The act of modeling a natural process.

SOFT SELECTION: The mechanism which allows inferior INDIVIDUALs in a POPULATION a non-zero probability of surviving into future GENERATIONs. See HARD SELECTION.

SPECIATION: (biol) The process whereby a new SPECIES comes about. The most common cause of SPECIATION is that of geographical isolation. If a SUB-POPULATION of a single species is separated geographically from the main POPULATION for a sufficiently long time, their GENEs will diverge (either due to differences in SELECTION pressures in different locations, or simply due to GENETIC DRIFT). Eventually, genetic differences will be so great that members of the sub-population must be considered as belonging to a different (and new) species.

SPECIES: (biol) There is no universally-agreed firm definition of a SPECIES. A species may be roughly defined as a collection of living creatures, having similar characteristics, which can breed together to produce viable OFFSPRING similar to their PARENTs. Members of one species occupy the same ecological NICHE. (Members of different species may occupy the same, or different niches.)

STANDARD DEVIATION: A measurement for the spread of a set of data; a measurement for the variation of a random variable.

STATISTICS: Descriptive measures of data; a field of mathematics that uses probability theory to gain insight into systems' behavior.

STEPSIZE: Typically, the average distance in the appropriate space between a PARENT and its OFFSPRING.

STRATEGY VARIABLE: Evolvable parameters that relate the distribution of OFFSPRING from a PARENT.

SUB-POPULATION: A POPULATION may be sub-divided into groups, known as SUB- POPULATIONs, where INDIVIDUALs may only mate with others in the same group. (This technique might be chosen for parallel processors). Such sub-divisions may markedly influence the evolutionary dynamics of a population (e.g. Wright's 'shifting balance' model). Sub-populations may be defined by various MIGRATION constraints: islands with limited arbitrary migration; stepping-stones with migration to neighboring islands; isolation-by-distance in which each individual mates only with near neighbors. cf PANMICTIC POPULATION, SPECIATION.

SUMMERSCHOOL: (USA) One of the most interesting things in the US educational system: class work during the summer break.

T

TERMINAL SET: (GP) The set of terminal (leaf) nodes in the parse trees representing the programs in the POPULATION. A terminal might be a variable, such as X, a constant value, such as 42, (cf Q42) or a function taking no arguments, such as (move-north).

TRAVELLING SALESMAN PROBLEM: The travelling salesperson has the task of visiting a number of clients, located in different cities. The problem to solve is: in what order should the cities be visited in order to minimise the total distance travelled (including returning home)? This is a classical example of an ORDER-BASED PROBLEM.

TSP: See TRAVELLING SALESMAN PROBLEM.

V

VALUE-BASED PROBLEM: A problem where the solution must be specified in terms of a set of real-valued parameters. FUNCTION OPTIMIZATION problems are of this type. cf SEARCH SPACE, ORDER-BASED PROBLEM.

VECTOR OPTIMIZATION: Typically, an OPTIMIZATION problem wherein multiple objectives must be satisfied.

Z

ZEN NAVIGATION: A methodology with tremendous propensity to get lost during a hike from A to B. ZEN NAVIGATION simply consists in finding something that looks as if it knew where it is going to and follow it. The results are more often surprising than successful, but it's usually being worth for the sake of the few occasions when it is both. Sometimes zen navigation is referred to as "doing scientific research," where A is a state of mind being considered as pre- PhD, and B (usually a different) state of mind, known as post- PhD. While your time spent in state C, somewhere inbetween A and B, is usually referred to as "being a nobody."


BACK HOME

LinkExchange
LinkExchange Member