In, I wrote that the disjoint sets algorithm is one of the very few algorithms every programmer should know. That got me thinking. What about must? If everyone must know about disjoint sets, what other algorithms must every programmer know about?I made a “top ten” list of algorithms and data structures every programmer must know about.The list would be as follows:. Lists, Arrays, Stack., and are certainly the most basic data structures, yet, these building blocks can reserve a few surprises.
For example, counter-intuitive are extensively used in the to represent the position just passed the last element in a list. This makes, for example, the insertion of an item in the end position rather efficient. While lists do not allow direct access to an element, arrays do, but at the cost of allocating a known a priori amount of memory. Conversely, inserting an element into a list, given the current position, is constant time, while to insert a new element in an array is more costly, depending on whether or not one wants to preserve ordering.
Secure wipe. WipeFile is a secure file and folder deletion utility, overwriting data to prevent later recovery. Although most users will only require a standard 1-pass wipe, the program supports 14 different wipe methods, including high-security military standards. Eraser Portable is a secure file-deletion and data wiping utility. With support for multiple wipes, pattern writing and more, it can be used to securely delete any sensitive data.
Stacks are a special case of both, one could say, because they behave mostly like lists where operations are allowed only at one end; and like arrays because they are most often implemented in a contiguous span of memory, i.e., an array. As these three are common building blocks of more complex algorithms and data structures, you should really master them. Trees. Trees, or more exactly whether they be, or, are the next data structures on the list, offering operations for searching, inserting, and removing values (in the average case). The most interesting tree varieties try to insure that they also have worse cases in, that is, they don’t go overly deep or take inordinately long to perform their operations. To do so, the prevailing strategy is to balance the tree so that it is of (almost) equal depth everywhere. Some varieties make the tree even shallower by having a branching factor much higher than two, like it is the case with (also known sometimes as trees).
Splay trees, on the other hand, unbalances the trees to shift the most oft-accessed item near the root of the tree, while remaining a search tree.Understanding these data structures and what trade-offs are involved will help you choose wisely which will suit your application better. Note that the structures that equalize depth do not only make sure that the worst case is the average case: they implicitly suppose that all items in the collection have an equal probability of being accessed—something quite contrary to the (or more accurately, maybe, to ). Sorting and Searching. Data does not always come in the right order, so it is possible that you will have to sort before applying any meaningful algorithm to it. Is a complex topic, but I think that the two algorithms you really should know about are and.QuickSort is a comparison-based sort, that is, elements are compared to each other to determine their relative order. Eventually, QuickSort makes enough comparisons ( for items, on average, but has a worst case of —which can be avoided, but that’s another story for now) to eventually produce a sorted list of items.
The theoretical lower bound for comparison-based sorts is comparisons, so QuickSort is doing very well on average. The great simplicity of the algorithm makes it very interesting in a very wide range of cases.Radix sort, on the other hand, an address transformation sort that sorts the items in linear time, that is, in, making it much faster than QuickSort. It is much faster, but Radix Sort is much simpler when keys are numeric or of fixed width; dealing with variable length keys efficiently makes the algorithm slightly more complex. Read more on Radix Sort and in this old.Searching on a sorted array can be performed using the basic in time. Creating and searching efficient will also play a major role in managing and searching large data sets. Priority Queues.
Sometimes you don’t really care if a data set is completely sorted or not. You just want to determine rapidly its maximum (or minimum) valued item and remove it from the set. Ideally, determining the maximum item should be an operation, adding and removing values a operation. Turns out that as implementation of priority queues are pretty efficient. Not only do they have efficient algorithms, they can also be implemented into contiguous arrays, dispensing one from using a pointer-based data structure, saving potentially a lot of memory. There are a lot of application of heaps, which range from (determining who goes next) to (removing the oldest items from the cache).
Pattern Matching and Parsing. Every single one of us had to write a parser or a filter of some sort. Whether it’s to find data in an insanely large log or to read a simple configuration file. Not always, but very often, the sanest way to go around it is to use, either using a library, such as ‘s, or ‘s, or by implementing the regular expression algorithms yourself—which would make it the insane way, I guess.
Understanding how they work will greatly help you process and filter insufficiently structured data. Don’t ask why is so popular. Hashing. Play a central role in, cache management,. Despite the superficial differences—the applications—all hash functions are closely related; in fact, they differ more in quality than nature.
A hash function takes a message and deterministically produces a signature that looks like a pseudo-random number, bits long, usually with rather large, say 128 or 256. The harder it is to correlate the hash value with the original data, the safer the hash function is. If it is really hard to find the original data given a hash value, the function can serve as a and be used in cryptographic applications. If the hash function is just good enough, it can serve as an (most probably) unique identifier for data items, which in turn can be used for cache management or hash tables. In either case, you have to understand the mathematics of the probability of collisions (two or more items hashing to the same value) so you can assess what hashing scheme is sufficient for your application. Read about von Mises’ to understand how to compute the probability of collision. Disjoint Sets.
The data structure is a helper structure that you will need in a wide variety of algorithms, whether graph algorithms or image processing. Disjoint sets serves to represent multiple sets within a single array, each item being the member of one of many sets.
Considered like that, this sounds like a rather special case, but in fact, there are a number of applications, not all obvious. Arise in graph algorithms are most efficiently represented using the disjoint set data structure. It can also serve to segment an image, a process also know as computing the connected components.
What’s also very interesting about disjoint sets is just how incredibly efficient operation are: using path compression, operations can be performed in expected constant time!The special case where there are only two possible sets reverts to a simple, a data structure that is simple to manage and requires comparatively little memory—use of which can be further reduced, with a price, using. Graph Algorithms and Data Structures. Problems like computing the, the determination of the between two nodes, and the detection of will arise in a number of situation. Google’s, for example, is a very concrete application of graph algorithms. Often, seemingly unrelated problems can be mapped to graph algorithms for which very efficient algorithms, possibly dynamic programming related, already exist. There is also a large body of literature devoted to the data structures used for graphs, considering every possible special case:, -rich, or networks, etc.
Dynamic Programming. Closely related to graph algorithms, exploit the fact that the optimal solution to a large problem can be expressed as an optimal combination of sub-problems. Not all problems are amenable to this method, because not every objective function abide to the, but many optimization problems do. Dynamic programming exchanges memory for computation, a generally beneficent trade-off.could be considered a limited form of dynamic programming where previous evaluations of an expensive function are cached and reused rather than recomputed if asked for again. Memoization greatly reduce computational complexity when used in combination with an optimization (or simply recursive) algorithm.
For example, the n-th can be computed in time using memoization, while the basic recursive formulation results in no less than calls! Here, is ‘s number, the. State Space Search Algorithms. Sometimes the scale of the problem, or the vastness of the state space makes it impossible to represent the problem as a graph. Consider as the example par excellence. The exact number of distinct valid states of the game is still debated, but is thought to be in the order of, and the number of arcs in the game graph somewhere around, as computed, therefore known as.
It would be infeasible to search the graph corresponding to all possible games to determine the optimal move given any valid chess position. To deal with this immense state space, one would use one of the many state space search algorithms such as, or if enough is known about the objective function, an algorithm like.State space search is very often used for game and other optimization problem where the structure is too complex, state space too vast, or of which too little is known to derive a more efficient algorithm.are closely related to, where the state space is searched at many different points in parallel. The darwinist metaphor is apt, to a certain point, as “genes” (vectors) are “mutated” (varied randomly) to search the state space.
Only the fittest (higher valued vectors, under the objective function) are kept for the next “generation” (iteration) others “die off” (are dropped).I could have made the list much longer, at least to include topics such as data compression, a personal favorite. I could have included something about, say, which is also highly useful in the context of industrial informatics, but I also wanted the list to be a Ten Commandments type list (without the guilt and the vengeful God thing), something that would be easy to remember and that would apply to most programmers out there.
Of course, you just read those and are thinking “why didn’t talked of XYZ”. Maybe I did miss something important; it may also be that “XYZ” is more of a niche topic than you think. I cannot think of this list as authoritative in any way; yet, it follows more or less closely what you will find a typical “Introduction to Algorithms” book. Either way, let met know.This post is also in line with the answer to one of Gnuvince’s commentator.
Computer Chess Algorithm
I think that, again, it is not about being a guru in every of the ten topics I presented, but merely proficient, and to develop the reflex to see the similitudes between your current problem and one of the algorithms you already know, even though the mapping may not be exactly self-evident. This is not a “pick n out of m” list however. The more, the better.
Algorithm For Chess Programming Software
This is the first article in a six-part series about programming computers to play chess, and by extension other similar strategy games of perfect information.Chess has been described as the Drosophila Melanogaster of artificial intelligence, in the sense that the game has spawned a great deal of successful research (including a match victory against the current world champion and arguably the best player of all time, Gary Kasparov), much like many of the discoveries in genetics over the years have been made by scientists studying the tiny fruit fly. This article series will describe some of the state-of-the-art techniques employed by the most successful programs in the world, including Deep Blue.Note that by the time the series is completed (in October), I will have written a simple implementation of the game in Java, and the source code will be freely available for download on my web site. So if you want to see more code samples, be patient; I'll give you plenty in due time!Games of Perfect InformationChess is defined as a game of 'perfect information', because both players are aware of the entire state of the game world at all times: just by looking at the board, you can see which pieces are alive and where they are located. Checkers, Go, Go-Moku, Backgammon and Othello are other members of the category, but stud poker is not (you don't know what cards your opponent is holding in his hands).Most of the techniques described in this series will apply more or less equally to all games of perfect information, although the details will vary from game to game. Obviously, while a search algorithm is a search algorithm no matter what the domain, move generation and position evaluation will depend completely on the rules of the game being played!What We NeedIn order to play chess, a computer needs a certain number of software components. At the very least, these include:.
Some way to represent a chess board in memory, so that it knows what the state of the game is. Rules to determine how to generate legal moves, so that it can play without cheating (and verify that its human opponent is not trying to pull a fast one on it!). A technique to choose the move to make amongst all legal possibilities, so that it can choose a move instead of being forced to pick one at random. A way to compare moves and positions, so that it makes intelligent choices. Some sort of user interface. This series will cover all of the above, except the user interface, which is essentially a 2D game like any other. The rest of this article describes the major issues related to each component and introduces some of the concepts to be explored in the series.Board RepresentationsIn the early days of chess programming, memory was extremely limited (some programs ran in 8K or less) and the simplest, least expensive representations were the most effective.
A typical chessboard was implemented as an 8x8 array, with each square represented by a single byte: an empty square was allocated value 0, a black king could be represented by the number 1, etc.When chess programmers started working on 64-bit workstations and mainframes, more elaborate board representations based on 'bitboards' appeared. Apparently invented in the Soviet Union in the late 1960's, the bit board is a 64-bit word containing information about one aspect of the game state, at a rate of 1 bit per square. For example, a bitboard might contain 'the set of squares occupied by black pawns', another 'the set of squares to which a queen on e3 can move', and another, 'the set of white pieces currently attacked by black knights'. Bitboards are versatile and allow fast processing, because many operations that are repeated very often in the course of a chess game can be implemented as 1-cycle logic operations on bitboards.of this series covers board representations in detail.Move GenerationThe rules of the game determine which moves (if any) the side to play is allowed to make. In some games, it is easy to look at the board and determine the legal moves: for example, in tic-tac-toe, any empty square is a legal move.
For chess, however, things are more complicated: each piece has its own movement rules, pawns capture diagonally and move along a file, it is illegal to leave a king in check, and the 'en passant' captures, pawn promotions and castling moves require very specific conditions to be legal.In fact, it turns out that move generation is one of the most computationally expensive and complicated aspects of chess programming. Fortunately, the rules of the game allow quite a bit of pre-processing, and I will describe a set of data structures which can speed up move generation significantly.of this series covers this topic.Search TechniquesTo a computer, it is far from obvious which of many legal moves are 'good' and which are 'bad'.
The best way to discriminate between the two is to look at their consequences (i.e., search series of moves, say 4 for each side and look at the results.) And to make sure that we make as few mistakes as possible, we will assume that the opponent is just as good as we are. This is the basic principle underlying the minimax search algorithm, which is at the root of all chess programs.Unfortunately, minimax' complexity is O(bsupn/sup), where b ('branching factor') is the number of legal moves available on average at any given time and n (the depth) is the number of 'plies' you look ahead, where one ply is one move by one side.
Basic Algorithms For Programming
This number grows impossibly fast, so a considerable amount of work has been done to develop algorithms that minimize the effort expended on search for a given depth. Iterative-deepening Alphabeta, NegaScout and MTD(f) are among the most successful of these algorithms, and they will be described in Part IV, along with the data structures and heuristics which make strong play possible, such as transposition tables and the history/killer heuristic.Another major source of headaches for chess programmers is the 'horizon effect', first described by Hans Berliner. Suppose that your program searches to a depth of 8-ply, and that it discovers to its horror that the opponent will capture its queen at ply 6. Left to its own devices, the program will then proceed to throw its bishops to the wolves so that it will delay the queen capture to ply 10, which it can't see because its search ends at ply 8. From the program's point of view, the queen is 'saved', because the capture is no longer visible.
But it has lost a bishop, and the queen capture reappears during the next move's search. It turns out that finding a position where a program can reason correctly about the relative strength of the forces in presence is not a trivial task at all, and that searching every line of play to the same depth is tantamount to suicide.
Numerous techniques have been developed to defeat the horizon effect; quiescence search and Deep Blue's singular extensions are among the topics covered in Part V on advanced search.EvaluationFinally, the program must have some way of assessing whether a given position means that it is ahead or that it has lost the game. This evaluation depends heavily upon the rules of the game: while 'material balance' (i.e., the number and value of the pieces on the board) is the dominant factor in chess, because being ahead by as little as a single pawn can often guarantee a victory for a strong player, it is of no significance in Go-Moku and downright misleading in Othello, where you are often better off with fewer pieces on the board until the very last moment.Developing a useful evaluation function is a difficult and sometimes frustrating task. Part VI of this series covers the efforts made in that area by the developers of some of the most successful chess programs of all time, including Chess 4.5, Cray Blitz and Belle.ConclusionNow that we know which pieces we will need to complete the puzzle, it is time to get started on that first corner. Next month, I will describe the most popular techniques used to represent chess boards in current games. See you there!Fran?ois Dominic Laram?e, April 2000.