1.2 - Genetics Algorithm - Problem Specification

1.2.1 - Introduction

We have a set of teachers who needs a schedule for a set of subjects over a set of periods, this schedule must take into account characteristics of periods of week, the limited availability of teacher and rooms, and some constraints which should be fulfilled.

The problem of assigning teachers to subjects and groups for timetabling problem has been studied by many authors. Tillet (1975) was the first to address it, using an integer linear programming model.

Then other authors like Shih and Sullivan (1977), Schniederjans and Kim (1987) and also Badri et al. (1998) developed linear programming models a different approach is followed by Dyer and Mulvey (1976), Bloomfield and McSharry (1979), Mulvey (1982) and Dinkel et al. (1989) who developed capacitated flow models for the problem.

1.2.2 - Problem Description

Our timetabling problem has the following elements:

  • A set of groups:
  • A set of subjects:
  • A set of teachers: we have to types of them:
  1. Full time teachers: are all week periods free teachers, they can attend at any time and have no forbidden periods at any time of the week.
  2. Part time teachers: they have certain periods to attend in, we must respect those periods in assigning subject.
  • A set of periods of the school week: the week is divided into five days each day has Five periods the total number of periods for each group is 5*5=25 period, it means a maximum of 30 lessons per week to each group.

1.2.3 - Problem Elements (Objectives and Constraints of the Problem)

In a course timetabling problem, generally, constraints are considered in two types. One of them is called hard constraints. Every acceptable timetable must satisfy these Constraints, they are:

  • (H1) Every lesson has to be assigned to a period or a number of periods, depending on the lesson’s length.
  • (H2) No teacher can give two simultaneous lessons.
  • (H3) No room can be assigned to two simultaneous lessons.
  • (H4) No teacher can give a subject which is out of his specialist, or the set of subjects he can give.
  • (H5) Forbidden periods of the teacher must be respected (no part time teacher can give out of his available times.
  • (H6) The teacher of a specified subject of a certain group must be unique.

The other type of constraints is the soft type, without those constraints an acceptable solution will be generated but the more we stick to those constraints the higher quality of the solution will be fulfilled.

Those soft constraints are:

  • (S1) All lessons of the same section cannot be assigned to the same day must be respected.
  • (S2) Class timetables should be as compact as possible, eliminating idle times for students.
  • (S3) Class timetables should be as compact as possible, eliminating idle times for students.
  • (S4) In class timetables, if possible students should not have a day with a single lesson.

1.2.4 - Genetic Algorithm Application

1.2.4.1 - Flow Chart

  1. Choose the initial population.
  2. Select parent chromosomes.
  3. Perform Crossover.
  4. Perform Mutation.
  5. Evaluate fitness of the new population.
  6. Repeat 2 until satisfied.

1.2.4.2 - Elements of Our Solution

  1. Nod x: which is a set of variables x (R, S, T, P)R: represents room
    S: represents subject
    T: represents teacher
    P: represents period
    the number of those variables is equal to the number of all lessons of all groups of all classes.
  2. The initial solution (chromosomes):
    We start working from the initial solution, which is the totally randomly generated population. When starting working we can reach the newer and newer population with better fitness function results, until totally satisfying the hard constraints. When all the hard constraints are satisfied we start working on the soft constraints until they are satisfied too. It’s essential during the work of this algorithm that all the hard constraints should be satisfied, while it’s not such essential that the soft constraints are satisfied. The chromosome is deemed to be a solution, if it satisfies all the hard constraints, and the optimal solution is the one that satisfies all the soft constraints too, producing a fitness function value of 0.
  • Parent Chromosomes:
    Parent chromosomes represent a parent with 2 children – male and female chromosomes, which are the actual data of the parent chromosome. These chromosomes are named as parent chromosomes, because they will reproduce using the crossover and mutation modifications to produce newer and more adaptive to the environment chromosomes, which will be decided by the fitness function, and choosing the most fitted parents for the next population.
  1. Population:
    Population represents the current state in the process of the evolution. Population is a set of chromosomes, which is limited to specific size. By continuous reproduction, and choosing the best fit for the next population, the present population is expected to develop to produce better and better results.
  2. Crossover Point:
    Crossover is one of the methods used for reproduction. Using this method reproduction is being made. This reproduction method is based on taking parent chromosomes and choosing a point – crossover point, where each one of the parent chromosomes will be divided into two parts. Then, the parent chromosomes reproduce by exchanging their parts, to produce two new chromosomes.One of the parameters needed to be identified how the crossover points are chosen. The basic method for choosing the crossover point is to choose a random cell within the chromosome. The second parameter that needs to be identified is the number of crossovers that has to been. Just as we are doing crossover in one point, we can do it in more than one point that might give a more variety to the reproduced chromosomes.
  3. Mutation:
    Mutation is another method of reproduction. Mutation is singular method of reproduction, as it is performed on a single chromosome, and not on the parent chromosomes (male and female). Using this method, we choose one of the cell parameters that we want to differentiate from others – to mutate. Then we mutate it to produce some new cell that introduces new data to the chromosome, and thus can affect its fitness.The two parameters that need to be identified here are: what should be mutated – some property of a cell, or all properties of a cell, and the second parameter – what is the best number of mutations that need to be done to introduce some variety, and not to get further from the optimal solution.
  • Objective function f(x):
    Objective function is the function which takes chromosome as an argument, evaluates it and gives the number, which represents the fitness, or the adaptation of the chromosome to the current environment.
    In our scenario, the fitness function gives optimal value, when it equals 0. The fitness function is supposed to be evaluated checking the chromosome to match the hard and soft constraints. In our program the genetic algorithm goes through 2 stages of counting fitness function:
    1 – hard stage – here we estimate the hard constraints within the chromosome
    2 – soft stage – here we estimate the soft constraints within the chromosome, is used when the chromosome hard stage is accomplished.
  • Stopping criteria:
    The basic stopping criteria which can be considered is reaching zero as a fitness function of soft stage. As our program is multi-threaded, user can view the progress, pause and resume the algorithm while it’s running. These options help a user a lot, as the program does not have to make decision when to stop. Depending on the results, and user decision, the stop criteria will be decided, unless the fitness function of the second, soft stage reaches zero.

1.2.4.3 - Programming Elements Used

Here you can find the description of the programming elements used in the implementation of the Genetic Algorithm on the problem of Time-Tabling. To support the implementation we have used the OOP approach in our implementation, so the classes where organized into a specific hierarchy to make it easier and more organized, thus producing cleaner and faster implementation.

Harshly saying, the classes are organized into the following hierarchy:

Screen Shot 2014-09-11 at 1.49.09 AM

Each of the above stated classes are described below in detail:

buffer

Buffer is a static class that is used in order to communicate with the database and buffer some data. This will speed up the performance and make communication with the database easier.FetchData() – method fetches the data from the database by executing the “SELECT ALL” query, in order to speed up performance of the systemgetRandomCell() – fetches random cell, which should be validgetRandomGCCell() – fetches the random cell, according to the provided groupid argumentWriteChromosome() – saves the provided chromosome to the specified text fileReadChromosome() – reads a saved into text file chromosome and returns it as a result of the method.

cell

Cell class represents the smallest building block of the chromosome. If we want to define the chromosome as a schedule, than we can define a cell as a one-slot entry, which specifies the slot number, teacher, room and subject and group to be taught.We provided the cell object with a copy construct, to return a new instance copy of the same cell. Also as we can notice from the above description, GenerateValidCell() will perform a call to the getRandomCell() / getRandomGCCell() of the Buffer static class in order to make current cell as a generated random valid cell, with or without the pre-specified groupid value.MutatedBy() method will accept a value, indicating what is the mutation performed by, and perform a mutation on the cell, executing the corresponding query. The available mutation types are: mutate by all (new generated random cell), mutate by group, mutate by period, mutate by room and mutate by teacher. Notice that the result of the mutation will be a valid cell._cumul property is very important when counting the fitness function. As the fitness function iterates through the current cell, the _cumul property will record the cumulative impact of the current cell on the fitness function total value._tillnow property will be also important as it also record the cumulative impact, but not totally cumulative, but just on the period of last genAlgoSettings.CrossOverCellNumbers number of cells. Then it can be used in future to define which interval of chromosome had the greatest impact on the total fitness function value.f – the flag which will indicate if the current cell has any contribution on worsening the fitness function value. Can be helpful in drawing the chromosome and coloring the bad cells.

chromosome

As we said before, a Cell is the smallest building block of the chromosome, so the chromosome is the schedule itself, defined by a set of Cells. Chromosome is the item within the population, an item which will be evaluated by the fitness function, and finally, the output of the program.elements property – is an array of cells, which define the chromosome itselfff – defines the latest fitness function value, which is saved for faster access to avoid evaluating chromosome each timefv – fitness values, which assess the compliance on the chromosome to each of the 4 hard constraintsnumber – number of elements within the chromosomecopy() – the same copy constructor to copy the chromosome as a whole producing a new instance, and new instances of the contained cells

FF() – evaluate the chromosome using the fitness function – this will update the ff and fv properties of the current chromosome

GenerateChromosome() - will generate a totally random chromosome of number number of cells and is used to introduce the initial random population

MarkUsed() – will mark the current chromosome as used, by activating the used property / flag. This will be helpful to avoid the chromosomes that were already chosen as parents in the iteration through the present population. This way we can ensure that in the next iteration the chromosomes used will be not identical to the same parents already chosen.

clearFlags() – clears all the flags f of a cell

clearTillNow() - clears all the _tillnow and _cumul properties of the cells to be reused when transferring from the current population to the new population.

UpdateGroups() – retrieves and updates the groupid’s values of each cell in the chromosome, according to the _subject and _groupnumber properties. This will ensure the consistency of data, when the chromosome of some old data was saved, and need to be retrieved, but the values of the groupid’s has been changed.

Mutation() – mutates the chromosome, using the procedure defined in the Reproduction.Mutation.Run() static method.

fitnessfunction

Fitness function is a very important static class, which defines the procedures and heuristics for counting the fitness function value. Actually the main methods for counting the fitness function values are the count() and countSoft() method. These methods accept a chromosome as an input parameter, and return the total fitness function value as a result of execution.The interesting thing about the fitness function class that it is designed to operate in 2 modes: soft mode and hard mode. Depending on the current stage value, defined by the HeuristicSwitcher.Stage property the fitness function will decide whether to count its value based on the soft or hard constraints. the chromosome function FF() (Chromosome.FF()) will check for the value of HeuristicSwitcher.Stage to match Stages.Hard or Stages.Soft, and will choose to correspondingly return FitnessFunction.count() or FitnessFunction.countSoft() for the current chromosome.The constraints and their weights are multiplied correspondingly to show the impact of the constraint of the fitness function. They are organized as follows:Hard Constraints / Weights:

HC1() / hc1w, HC2() / hc2w, HC3() / hc3w, HC4() / hc4w

Soft Constraints / Weights:

SC1() / sc1w, SC2() / sc2w, SC3() / sc3w, SC4() / sc4w

genalgo

This is considered as the main class of our implementation. The GeneticAlgorithm class is responsible for the main code flow management. The class contains several essential properties, which contribute to the work of the algorithm:

  • PresentPopulation – the current population, which is the main source of reproduction to create the new population
  • NewPopulation – new population, which is produced step by step by taking parents from the present population, performing and crossovering to reproduce into the new population.
  • ResultChromosome – here the result chromosome of the best fitness function should be saved.

The algorithm consists of the three major parts: init() – initialization, connection to the database, and preprocessing of the data, input() – generating first random PresentPopulation, which will be the initial seed of the program, and run() – implementation of the algorithm itself to obtain the ResultChromosome using the PresentPopulation seed.

heuristicController

Heuristic Controller – this class is responsible for monitoring the performance of the currently used heuristic of reproduction and changing it to obtain better results. This way we ensure that the program will not enter the infinite loop and will try to reproduce in different types of defined techniques to reach better results. In order to monitor the performance of currently active heuristic, several properties are needed to be recorded and constantly reviewed:This is done by two methods of the static class: checkPerformance() – which will check the performance of the current heuristic based on some specific metrics, and takeAction() – takes some specific appropriate heuristic modification action, in case the performance using the active heuristic is unsatisfactory. Check performance works in the following way:

  • Gradient – is the difference between the old and new values for the fitness functions (the change in fitness function values)
  • AccGradientAvg – Gradient Average till the current moment in time (iteration)
  • Stepping = Gradient / AccGradientAvg – shows the share of the current gradient from the gradient average.
  • If the Stepping produced is less than the SteppingThreshold, then it’s considered a bad result

The algorithm checks for the number of consequent bad results. If it is more than GenAlgoSettings.SteppingNumberThreshold, than the whole heuristic after several number of chances is considered as bad, and a next heuristic is executed within the takeAction() method.

heuristicSwitcher

Heuristic Switcher checks for the currentHeuristic number and redirect the program to the next heuristic using the NextHeuristic() method. A heuristic within our implementation is defined as a set of values for optional attributes, such as mutateby, CrossOverNumber, MutationNumber, SteppingNumberThreshold, SmartCrossOver and CrossOverCellNumbers. Using these parameters we will be able to define specific heuristic, way of reproduction that can be used by the parent chromosomes to reproduce in different ways. The Stage parameter saves the value, defining if the program is current running in the hard or soft stage.

parents

A male and a female chromosome is considered as parent, which will in future reproduce to obtain a better population. This parent object has a copy() constructor, which will call the copy constructors of both the chromosomes to produce a completely new copy of the parent instance.CrossOver() is a method which will recall the Reproduction.CrossOver.Run() in order to following the specific procedure for the current parent chromosomes. MarkUsed() will mark both the parent chromosomes, male and female as used.

population

Population is a set of chromosomes, which should be used to reproduce and obtain better population through evolution. The population contains the property number, which is the pointer to the last filled chromosome in the population. The count property returns the total number of chromosomes within the current population.This class contains several assisting methods to assist the work of the population in an easier way:

  • addChromosome() – adding a new chromosome to the population
  • addParents() – add parent to the current population, i.e. adding male and female chromosomes to the current population
  • ChooseParents() – choose the best 2 chromosomes as parents, with the smallest fitness function values, and mark them as used to avoid re-choosing them on the next iteration.
  • clearUsed() – clear used flag for all the chromosomes in the population
  • GeneratePopulation() – generate a population of random valid chromosomes.

getAllFitness() – perform fitness function execution FF() for all the chromosomes in the population.

reproduction

Reproduction class performs the actual reproduction, corresponding to the two methods: CrossOver and Mutation. The two methods of reproduction are defined as classes within the static reproduction class.

stages

Stages Enumerator – indicates the HARD and SOFT stage of the running the algorithm. This will determine the fitness function count method chosen and the heuristics and their order.