Homework: Algorithms

Choose a simple algorithm and a standard description of it, such as linear search in a sorted array. Rewrite the algorithm in prosecode. Repeat the exercise with a more interesting algorithm, such as heapsort. Now choose an algorithm with an asymptotic cost analysis. Rewrite the algorithm as literate code, incorporating the important elements of the analysis into the algorithm’s description.

Linear Search as Prosecode
LinearSearch(x,l) searches for the position of the item x in a ordered list l. For this purpose, every item i in the list is compared with x. If an item equals x, its position is returned. Otherwise the algorithm returns an invalid position, when the end of the list is reached.

1. (Initialize) Set the starting position p <– 0.
2. (Search) For each position p in l:
(a) If i at position p equals the searched element x, return p and stop the search.
(b) Otherwise continue with the next position.
3. (Return) If x was not found in the list, return an invalid position Ω.

Heapsort as Prosecode
HeapSort(l) sorts the elements in a list l.
At first, the list is turned into a max heap. A max-heap is basically a complete binary tree with the additional property that each child node are smaller or equal to their parent nodes. This means that the largest value is always at the root node of the binary tree.
A sorted array is created by repeatedly removing the value from the root of the heap and inserting it into the array. After the removal of the root, the heap needs to be updated.

1. (Initialize) Turn the list l into a max-heap.
2. (Sort) The largest value of the heap is copied to a sorted array. Afterwards, the heap has to be updated so that the next largest value becomes the next root. This is repeated until the heap is empty
(a) Take the root r and add it to the array a
(b) Update the remaining heap
3. (Return) If the heap is empty, return a.

BogoSort as Literate Code
BogoSort(l) sorts the elements of a list l.

The major steps of the algorithm are as follows:
1. Generate a permutation of the items of the list.
2. Check if the elements in the permutation are ordered.
3. Return the list or repeat.

We now examine these steps in detail.
1. Generate a permutation of the items of the list.
Basically, we want to shuffle the order of the items in the list at random.
This step is really important as it allows us to reach a performance of O(n) in the best case scenario. The best case scenario means that the first permutation we chose was the right one.

2. Check if the elements in the permutation are ordered.
Each item i in the list l is compared to the next item n. If i is smaller than n in all cases, the list is sorted.

3. Return the list or repeat.
If the second step yields a positive result, the sorted list l is returned. Otherwise, we start again at the first step.

Cost analysis
As mentioned above, this algorithm has a great best case performance of O(n). However, the average case performance is not as good, because the algorithm is often repeated many times. Therefore, its average case performance is O((n+1)!). The worst case, however, means that no solution can be found. Of course, this should only be the case if bad random number generators are used.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Nearly forgot about this homework.

Analysis of “Fighting for breath”

Read the text „Fighting for breath“ in Tim Skerns „Writing Scientific English“ (p. 50f) and analyze its adherence to the aspects of good style.

The text “Fighting for breath” was written by Dr. Mark Porter and published in the BBC Radio Times in the year 1999. The author discusses the influence of air pollution on health problems.

We learned about good style with guidelines what were sometimes specific to scientific writing. While the given text seems to be a newspaper article, many of those rules apply here, too.
Still, I left out some negative criticism. For instance, I would point out the lack of citations in a scientific paper, but this does not seem appropriate in this case.

There is not much to criticize about this text but some minor flaws. For example, the title does not clearly state the topic and tries to be catchy. Furthermore, there are some parts of sentences that could be removed.
Additionally some words in the text were in italics, which was irritating, because it placed emphasis on words that didn’t need it. However, I am not certain if this use of cursive was intended by the original author or added later by Tim Skern.

The text showcases a lot of properties of good style. In my opinion the variety of sentence and paragraph length as well as of the sentence structure was appropriate. The active voice and the correct tenses were employed. Moreover, jargon was not used.

All in all, the author adheres to the aspects of good style.

Zobel’s Checklist

Regarding hypotheses  and questions
What phenomena or properties are being investigated? Why are they of interest?

The paper proposes a new approach to the search for related and similar documents. This can be useful for document retrieval and recommendation.
Their approach is graph-based, which usually goes hand in hand with expensive graph-operations. The authors claim to circumvent this by applying the needed knowledge from the graph to the documents during a pre-processing step

Has the aim of the research been articulated? What are the specific hypotheses and research questions? Are these elements convincingly connected to each other?

The authors want to show three things:
Their approach provides higher correlation with human notions of document similarity than comparable measures.
This holds true for short documents with few annotations.
The calculation of document similarity is more efficient, compared to graph-traversal-based approaches.

To what extent is the work innovative? Is this reflected in the claims?

The work provides a new and better approach to the assessment of relatedness of documents.
Yes, this is reflected in the claims

What would disprove the hypothesis? Does it have any improbable consequences?

Experimental results that show lower correlation with human notions of document similarity than comparable measures for normal or short document lengths or a less efficient calculation of document similarity, compared to graph-traversal-based approaches would disprove their hypotheses.

What are the underlying assumptions? Are they sensible?

They assume, that graph-based methods are not fast enough and that their approach is better than other approaches in finding similar documents. Both assumptions are sensible

Has the work been critically questioned? Have you satisfied yourself that it is
sound science?

I think so.
Regarding evidence  and measurement
What forms of evidence are to be used? If it is a model or a simulation, what
demonstrates that the results have practical validity?

An experiment using the standard benchmark for multiple sentence document similarity.

How is the evidence to be measured? Are the chosen methods of measurement objective, appropriate, and reasonable?

They use the Pearson and Spearman correlation and their harmonic mean as well as a quality ranking using “Normalized Discounted Cumulative Gain”.
The authors state that those metrics are used in related work as well. I can’t say for sure, if those metrics are objective and appropriate, but they allow comparison to other work, which seems reasonable.

What are the qualitative aims, and what makes the quantitative measures you have chosen appropriate to those aims?

I can’t say if they are appropriate, because I don’t know anything about those measures.

What compromises or simplifications are inherent in your choice of measure?

Because they only want to measure how well relevant documents are discovered, the qualification evaluation is only used on the top elements their approach turns up.
Additionally, they use a benchmark, which may or may not replicate real data.

Will the outcomes be predictive?

I’m not really sure what this question aims at.

What is the argument that will link the evidence to the hypothesis?

Standard measures and standard benchmarks ensure that the results can at least partially confirm or disprove the hypotheses.

To what extent will positive results persuasively confirm the hypothesis? Will negative results disprove it?

As only a benchmark and not real data is used in the experiment, the results cannot totally prove or disprove the hypotheses. Nevertheless, the results can give an indication towards one or the other.

What are the likely weaknesses of or limitations to your approach?

The pre-processing step has to be done, which probably needs time.

Abstract

The 90s of the last century were marked by the invention of quantum algorithms, like the one by P. W. Shor, which is used for factorization of integers, or the algorithm for fast database searches by L. K. Grover. Those algorithms can be used to crack widespread cryptographic systems like RSA, which relies on the difficulty of the factorization of big numbers.

If classic cryptographic systems are cracked, all past and future communication using those systems is compromised. Quantum encryption solves this problem, as it can only be successfully attacked in real time. For its implementation, quantum key distribution systems are needed.

This paper surveys the state of the art of quantum key distribution systems as well as their properties.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Can I use acronyms like RSA? It stands for “Rivest, Shamir und Adleman”, which sounds kind of silly in the middle of a sentence.

Punctuation Game

We live in the era of Big Data with storage and transmission capacity measured not just in terabytes but in petabytes (where peta- denotes a quadrillion or a thousand trillion). Data collection is constant and even insidious, with every click and every “like” stored somewhere for something. This book reminds us that data is anything but “raw”; that we shouldn’t think of data as a natural resource but as a cultural one that needs to be generated protected and interpreted. The book’s essays describe eight episodes in the history of data – from the predigital to the digital. Together they address such issues as the ways that different kinds of data and different domains of inquiry are mutually defining how data are variously “cooked” in the processes of their collection and use and conflicts over what can or can’t be “reduced” to data. Contributors discuss the intellectual history of data as a concept, describe early financial modeling and some unusual sources for astronomical data, discover the prehistory of the database in newspaper clippings and index cards and consider contemporary “dataveillance” of our online habits as well as the complexity of scientific data curation.

During succession ecosystem development occurs, but in the long term absence of catastrophic disturbance a decline phase eventually follows. We studied six long term chronosequences in Australia, Sweden, Alaska, Hawaii and New Zealand; for each the decline phase was associated with a reduction in tree basal area and an increase in the substrate nitrogen to phosphorus ratio, indicating increasing phosphorus limitation, over time. These changes were often associated with reductions in litter decomposition rates, phosphorus release from litter and biomass and activity of decomposer microbes. Our findings suggest that the maximal biomass phase reached during succession cannot be maintained in the long term absence of major disturbance and that similar patterns of decline occur in forested ecosystems spanning the tropical temperate and boreal zones.

Facebook’s Graph API is an API for accessing objects and connections in Facebook’s social graph. To give some idea of the enormity of the social graph underlying Facebook, it was recently announced that Facebook has 901 million users and the social graph consists of many types beyond just users. Until recently the Graph API provided data to applications in only a JSON format. In 2011 an effort was undertaken to provide the same data in a semantically enriched RDF format containing Linked Data URIs. This was achieved by implementing a flexible and robust translation of the JSON output to a Turtle output. This paper describes the associated design decisions, the resulting Linked Data for objects in the social graph and known issues.

 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I finished this assignment on thursday morning. I’m sorry, I only uploaded it now. Btw, this was really hard.

Research

Lo, H. K., Spiller, T., & Popescu, S. (1998). Introduction to quantum computation and information. World Scientific.

This book explains to basics of quantum mechanics and of quantum computation in a easy understandable manner. Really useful if you are new to the topic and a suprisingly good read.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Bernstein, D. J., Buchmann, J., & Dahmen, E. (Eds.). (2009). Post-quantum cryptography. Springer Science & Business Media.

This book explains which encryption algorithms can be cracked by quantum computation. Also it discusses which algorithms could be used instead.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Shor, P. W. (1999). Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM review, 41(2), 303-332.

This is the paper in which Shor proposed his algorithm. It is one of two very important algorithms in quantum computation, which is reflected in 5,702 citations. The algorithm can factorize integers far faster than algorithms that run on classical computers which is why quantum computers will be really good at cracking RSA encryption (and a few others).

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Grover, L. K. (1996, July). A fast quantum mechanical algorithm for database search. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing (pp. 212-219). ACM.

This is the paper in which Grover’s algorithm, the other really important one, was proposed. The algorithm is far faster at finding certain entries in lists or databases than any conventional algorithm.
The paper was cited 3522 times.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Gottesman, D., & Lo, H. K. (2001). From quantum cheating to quantum security. arXiv preprint quant-ph/0111100.

This paper explains why both algorithms are dangerous to certain encryption techniques. It was only cited 59 times(not much in comparison to the others), but all papers by the same author I came across seem trustworthy. Also he is one of the co-authors of the book I mentioned first, which was cited 347 times and made a positive impression on me.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I found a few other books and papers but I barely skimmed most of them. I didn’t even finish everything I mentioned (two technical books and three longish papers take a bit of time to read), so I have much to read.

It is possible, that I will leave some of the topics covered by the papers and books above out. But I am new to this topic so everything mentioned above is interesting and could prove useful later.

Summary of “Warp Drive Research Key to Interstellar Travel”

The text “Warp Drive Research Key to Interstellar Travel” was written by Mark Alpert and published as an article in the Scientific American.

The author starts off with the history of the fictional warp-drive engine, which is a really important piece of technology in the Star Trek Universe. For example, it enables the famous starship Enterprise to travel faster than light.

This technology however is currently investigated in the Johnson Space Center in Houston. The head of the advanced propulsion program there, Harold White, works on an experiment with only $50,000 funding for this project. If it is successful, it may provide clues for the development of a device, that could warp space time. With the help of this device, faster-than-light travel could be possible. This is important, because the only spacecraft, that left our solar systems until now, would need about 70,000 years to reach any nearby star. With a warp-drive the amount of time needed to get to another star system would shrink down to mere weeks.

Voyager I is the name of this probe that has left our system behind. It was originally send out by NASA to study Jupiter, Saturn and their moons.

Because many people share the enthusiasm for interstellar travel, organisations were created, that explore more realistic means of space travel, which come with their own unique challenges. One of them is Icarus Interstellar, which is looking into nuclear fusion propelled craft. Their problem is mainly that nuclear fusion is not yet fully developed, despite being researched since 50 years.

Interstellar travel itself comes with a myriad of problems. For example there is space dust, which doesn’t sound dangerous at all, but becomes quite lethal at high or even relativistic speeds. Spacecraft need protective plating to withstand those impacts. Additionally, deceleration is needed to arrive safely at a destination. Therefore, additional fuel is needed.

Regardless of the challenges, research into interstellar travel is still ongoing. The exploration of other star systems or even earth like planets is said to be essential to the survival of mankind. If humanity is spread out across multiple worlds, planetwide calamities, like asteroid impacts or a nuclear war, are less likely to wipe out all human beings.

Homework

Assignment 1:

Done.

Assignment 2:

  1. State of the art of urban Internet of Things
  2. Generation of string test cases via multi-objective optimization

Assignment 3:

Not really done. I can think of a few general topics, that sound interesting to me. However, I can’t tell, which one I should choose. They all seem too broad and some of them could be too difficult for me.

Here is the list of interesting topics:

  • Quantum computation
  • Bio-Computation (for example DNA-Computation)
  • Unconventional computation in general
  • Artificial intelligence

There are most certainly many other interesting topics. I am open to suggestions and would gladly accept help in finding a topic.