During my undergraduate, I got to work on a number of projects, both graded or on audit. The most notable ones among them, spanning nearly an entire semester each, are listed below, with short summaries about each work. For my publications, check out my Publications
Rate Adaptive Resource
Allocation in Multi-User OFDM Systems using NSGA2
by Nitin Sharma,
Adithya Rao, Akshat Dewan, Mustafa Safdari
- This was my first paper in
EA. Our team used a MOEA to optimize a conflicting set of objectives in a
resource allocation problem.
- Published under proceedings
of 4th IEEE International Conference on Wireless Communication and Sensor
Networks 2008 (ISBN: 9781424433278)
- The whole concept was to maximize the throughput of a base station while performing efficiently, when allocating user-wise resources. It is a resource multi-objective resource allocation problem, as one tries to utilize the maximum capacity of the channel given a Total Power constraint for the base station, and also deliver on some performance guarantee. In literature, there exist many different optimization schemes to allocate various resources to the users, catering to various kinds of service guarantees. We chose maximizing the payoff, or the rate, of the user with the minimum allocated channel resources (to guarantee a decent service to the worst case user, avoiding a starvation effect). In other words, maximizing the minimum payoff.
- Choosing this as one of the objective does give us two seemingly conflicting objectives, as by pumping in more resources for a user who doesn't have adequate channel conditions would result in reducing the total capacity utilization. Therefore a tradeoff decision must be made.
- NSGA2 was used as MOEA. The
results obtained were better than non-evolutionary optimization approach
by 20% and better then previous evolutionary approach using simple GA by
- To select as the final solution of the simulation, the Rank 1 individual in the first front is selected as the final solution.
- A lot more work needs to be done on this itself, as we need to analyze how the two functions converge individually, and how different solutions can be picked up from the Pareto optimal front as the final solution instead of taking the rank one solution as the final answer.
- I learned about:
- formulating an optimization
- identifying constraints,
variables, affecting parameters
- coding for the candidate
solutions in the EA
- Reformulating the objective
function to make its optimization easier
- Fine tuning parameters of
the EA for better results
- Analyzing the data from the
experiments and deriving a logical conclusion
- Identifying drawbacks,
areas of shortcomings, pitfalls in the approach and scope of future
Evolving Universal Hash Functions using Genetic Algorithms
by Mustafa Safdari
- This was my second paper in
EA. It dealt with applying a simple genetic algorithm to construct a
universal hash function that gives minimum number of collisions for a
given set of keys
- Published under proceedings
of IEEE International Conference on Future Computing and Communication
2009 (ISBN: 978-0-7695-3591-3), part of Proceedings of
the 11th Annual Conference Companion on Genetic and Evolutionary
Computation Conference 2009: Late Breaking Papers, presented at the GECCO 2009 Undergraduate
- It began as an exercise to use Genetic Programming to come up with the a "good enough" mapping function to hash a given input distribution. I first tried out using MIPS assembly code as the basis for Genetic Programming, and evolving a hash function out of the generated assembly code. But I was unable to frame fitness functions which would create enough selective pressure, and as a result, the hash function generated was not a good trade-off between the number of collisions and the computational expenses.
- When I learned about Universal Hash Functions, I realized instead of using GP to evolve a completely new Hash Function, why not use the Universal Hash Function Template to come up with a pseudorandom Hash Function. Universal Family of Hash Functions already provides a statistical upper bound on the number of collisions, so why not exploit that property to 'choose' a member of this set? Well, it turns out that for a given input distribution, if that 'random' choice turns out to be a bad one, then we will get a lot of collisions. Then the guarantee of average performance isn't much relevant, as a single worst case would mean a totally bad choice with lots of collisions for that function. Instead I sough to find out how much of a computational overhead is required to make a 'good enough' choice for that hash function, so that we are guaranteed least possible collisions at all times. Enter GA. First I tried to find out, what is the fitness landscape like for various choices of the parameters. It turned out to be like a rocky valley, steep sharp valleys, lots of fluctuation between neighboring solutions' fitnesses. But there was an overall average trend to the graph too, which varied with the input distribution, and the number of buckets to map the keys into.
- The first figure shows the spiky nature of the graph, where there is a drastic difference in fitness between neighboring solutions. But most of runs gave a similar-natured graph, shaped like a chair, with a region like A, where 'good' solutions were usually found. Note that in this case, the 'best' solution was actually in region B, but instead of checking for the entire graph, if we hit region A in each graph, we can consistently get good solutions for each run. It saves overhead of having to search the entire landscape. BUT, in some cases, the region A isn't 'good enough', then we do have to search for region B, as it offers a much better solution. In the case of this figure however, both regions are equally 'good'.
- The next figure above shows a 3-D view of the landscape. As I mentioned, there is a general trend to the graph, its like a chair. It proceeds down, then a bump and then averages off. this is a characteristic for most run configurations. The graph plotted here is with the parameters to be optimized as x and y variables. The z axis shows the number of collisions, which is the inverse of fitness. The more the collisions, lesser the fitness. For each run configuration, the trick then remains to find this general region A (or B), which if identified, we can select any solution within it. But there isn't any general formula for finding that region, none that appeared to me. Also, we cannot make a decision whether to select region A or B, by just observing the input configuration. Some hash function evaluations have to be made to get the underlying trend.
- This is where the GA enters. I tuned the GA so that the search looks for this particular region, with the characteristics mentioned above. Once it hits that region (whether A or B is a matter of luck - or chance), the population quickly converges to a solution in that region.
- The results from this experiment were a mixed bag. Firstly, the GA got me a very good solution (least possible collisions for a given input key distribution) in less than 10 generations, when running with a fairly big population size (100). I tried playing around with the parameters, tweaking various rates and measures here and there, but still, GA gave the solution very fast. Something was fishy to me. Was GA too good for this problem? Turns out it was. What I was essentially trying to do, as Dr. Frank Moore rightly pointed out at the UGWS at GECCO 2009, to swat a fly with a cannon-ball. The GA was turning out to be so strong for the problem, that even before it was into the full fledged search, a solution was evident. In other words, a GA was too good for this problem. But wasn't this a question of random search of non-deterministic nature? Yes it was. So why not GA? 'Coz there is no structure to the solutions.
- There are no building blocks. Its like Death Valley, the fitness landscape. Can a random search outperform GA in this case? A nagging question was developing. I sought to find out. I conducted several runs, infact left my machine running overnight, trying out 100 to 10,000,000 input keys. The result: GA did perform better. But isn't it always supposed to perform better than random search? Yes it is! That is why we use GA and not random search. But what about the degree of better performance? How well did GA perform over random search? This was the most suspenseful aspect of the entire journey so far...
- Leiben and Smith's new book on "Introduction to Evolutionary Computing," has a very nice graph. Its a 1980s view of the performance of various algorithms and techniques on various problems. It showed that EA always performed better than Random Search. But studies since then suggested otherwise. There were certain problems where the overall tradeoff, in terms of performance and quality of solution, rendered random search a better contender than EA. Turns out, Hash Function selection, was one such problem! I realized that taking into account the computational time and resources the GA was taking to come up with a solution, and the quality of solutions it was producing, was not as good a deal, as random search. My idea was in jeopardy of being irrelevant. In the age of sophisticated precision ballistics, there was no need for antiquated cannons. Did GA meet its match?
- Thankfully it didn't. I had a hunch that GA would still do better. It consistently gave results in almost the same number of generations, no matter what the input keys distribution size or range be. This meant that If I wanted a certain level of performance, or a certain quality of a solution, GA would consistently give that in fixed number of generations. And this was the case! Turns out, random search took a lot longer and many more function evaluations on bigger problems to find a good solution, whereas GA found that solution easily, with lesser evaluations compared to random search. Victory to GA! Hurray. But not so fast. This means that GA can only be used for a large-scale problem of this kind.
- When I reached my destination, the final results, it was a satisfying feeling like no other. I learned more than what I started out with. It was like a wonderful journey. I learned a very valuable lesson about the kind of problems GA is good at and not so good at. The initial defeat of GA at the hands of random search taught me some properties of GA. And the most important lesson of them all: NEVER USE A CANNON TO SWAT A FLY.
- I learned about:
- Understanding the factors
which affect the results of a GA
- Factors which cause GA to
be stuck in local optima
- Theory of universal hash
functions, and tradeoffs in random selection of a hash function
- Understanding the fitness
landscape of an optimization problem
- Preparing arguments to
present the findings of a research work and supporting the claim
Technique to find Optimal Nucleotide Position within the sequence for Site
Specific Mutagenesis to downregulate over-expression in Myc Gene
- This was a computer
simulation based project
- I was
the group leader of a team of 8 students in charge of developing a
simulation software for gene characterization, and using that software to
find Optimal Nucleotide Position within the sequence for Site
Specific Mutagenesis to downregulate over-expression in Myc Gene by using
appropriate characterization techniques.
- The project was mainly about developing a technique, a mathematical combination of sorts, which can uniquely characterize a given nucleotide sequence. This way, a change in any one position, would a different characteristic value than the original, and hence can be identified and worked on. It was then hypothesized, to use this characterization to perform secondary site reversion, to bring a mutated gene with a new characteristic value, back to its original characteristic value.
- Earlier, the group had used Electron Charge Density (of each atom in the nucleotide molecule), Dipole Moment (of the nucleotide molecule), and Entropic measures as the constituents in the formula for characteristic value of the gene. It was hypothesized that since these properties form the basis of all bonding and energies, and interaction with the proteins, they can be used to characterized a gene sequence through nucleotides. What was missing was a way to combine this measure, into a mathematical combination, so that a resultant value can be calculated. This combination needed to consider a lot of things:
- Value of each constituent nucleotide must depend on its own properties, and some influence factor of its neighbors (in rough terms, the properties of the nucloetide would be (1+x) times the characteristic value it should have). This influence factor is dependent on the nature of its neighbors, on its position within the sequence (start n stop codons have varied effects than the inner codons) and on the amino acid it translates to.
- Once this value is calculated, it needs to be combined in such a way, that two sequences, differing in nucleotide bases in only 2 positions, can have as much a difference in value between them, as possible. In other words, we intend to differentiate each point in the possible nucleotide sequence space, so that each sequence and its mutation can be viewed separately and worked with.
- Many such combination methods were analyzed and compared. They ranged from matrix operations (like cross products) to simple sensitivity analysis operations. From all of the possible solutions for reversion, i.e all of the reversible positions on the sequence, the most favorable position was hypothesized as the one where the distance of the new sequence after reversion, was closest to the original unmutated gene. A sample graph is like:
back to top ^
- Co-ordinating a project
group for software development
- Data analysis techniques in
- Plotting 2D and 3D graphs
using MATLAB and LabVIEW
- Literature review of topics
different from one's area of expertise
JAEA - a Java Based
benchmarking framework for EC based optimizing algorithms
back to top ^
- This was a Research
Assistantship at Centre for Computational Intelligence (C2I), School of Computer
Engineering (SCE) at Nanyang Technological University (NTU), Singapore
- The project was under the
direct supervision and guidance of Prof. Yew Soon Ong, Associate
Professor, SCE and Director, C2I labs.
- The project was about understanding optimization benchmarking, the kind of test problems that are used for benchmarking, the properties that are benchmarked, and the reporting framework and methodology used for benchmarking. The frameworks used at GECCO and CEC were used for developing the understanding. Using a pre-existing EA framework at NTU, a new optimization framework was built in Java for Optimization benchmarking of EA.
- The project included a 2.5
months stay in Singapore working directly with the faculty and PhD
researchers who would use the framework in their studies. The project is still under progress.
- I learned about:
- Commonly used
Optimization Benchmarking frameworks in CEC 2005 (http://www3.ntu.edu.sg/home/EPNSugan/index_files/CEC-05/CEC05.htm ) and GECCO 2009
- Combining the test
functions found in these frameworks and their functionalities into an
all-encompassing framework consisting of all the features and more
- Included around 70 real
parameter optimization functions
- Included 2 real world
optimization problems concerning molecular cluster arrangement
- Advanced reporting
technique combining those of the 2 mentioned above and including new
- Added new performance
metrics for analyzing various optimization algorithms
- Communication in a research
team through weekly reports, seminars, brainstorming and one-on-one