World Library  
Flag as Inappropriate
Email this Article

Sardinas–Patterson algorithm

Article Id: WHEBN0023632960
Reproduction Date:

Title: Sardinas–Patterson algorithm  
Author: World Heritage Encyclopedia
Language: English
Subject: Algorithms, Coding theory
Collection: Algorithms, Coding Theory, Data Compression
Publisher: World Heritage Encyclopedia
Publication
Date:
 

Sardinas–Patterson algorithm

In

Further reading
  • Berstel, Jean; Perrin, Dominique; Reutenauer, Christophe (2010). Codes and automata. Encyclopedia of Mathematics and its Applications 129. Cambridge: Cambridge University Press.  
  • Berstel, Jean; Reutenauer, Christophe (2011). Noncommutative rational series with applications. Encyclopedia of Mathematics and Its Applications 137. Cambridge:  
  •  
  • Rodeh, M. (1982). "A fast test for unique decipherability based on suffix trees (Corresp.)". IEEE Transactions on Information Theory 28 (4): 648–651.  .
  • Apostolico, A.; Giancarlo, R. (1984). "Pattern matching machine implementation of a fast test for unique decipherability". Information Processing Letters 18 (3): 155–158.  .
  • Rytter, Wojciech (1986). "The space complexity of the unique decipherability problem". Information Processing Letters 23 (1): 1–3.  .
  •  
  • Sardinas, August Albert; Patterson, George W. (1953). "Convention Record of the   .

References

  1. ^ Sardinas & Patterson (1953).
  2. ^ Knuth (2003), p. 2
  3. ^ Berstel et al. (2009), Example 2.3.1 p. 63
  4. ^ Rodeh (1982).
  5. ^ Apostolico and Giancarlo (1984).
  6. ^ Rytter (1986) proves that the complementary problem, of testing for the existence of a string with two decodings, is NL-complete, and therefore that unique decipherability is co-NL-complete. The equivalence of NL-completeness and co-NL-completeness follows from the Immerman–Szelepcsényi theorem.
  7. ^ Salomaa (1981)
  8. ^ Berstel et al. (2009), Chapter 2.3

Notes

See also

A proof that the algorithm is correct, i.e. that it always gives the correct answer, is found in the textbooks by Salomaa[7] and by Berstel et al.[8]

Since all sets S_i are sets of suffixes of a finite set of codewords, there are only finitely many different candidates for S_i. Since visiting one of the sets for the second time will cause the algorithm to stop, the algorithm cannot continue endlessly and thus must always terminate. More precisely, the total number of dangling suffixes that the algorithm considers is at most equal to the total of the lengths of the codewords in the input, so the algorithm runs in polynomial time as a function of this input length. By using a suffix tree to speed the comparison between each dangling suffix and the codewords, the time for the algorithm can be bounded by O(nk), where n is the total length of the codewords and k is the number of codewords.[4] The algorithm can be implemented using a pattern matching machine. [5] The algorithm can also be implemented to run on a nondeterministic turing machine that uses only logarithmic space; the problem of testing unique decipherability is NL-complete, so this space bound is optimal.[6]

Termination and correctness of the algorithm

The algorithm computes the sets S_i in increasing order of i. As soon as one of the S_i contains a word from C or the empty word, then the algorithm terminates and answers that the given code is not uniquely decodable. Otherwise, once a set S_i equals a previously encountered set S_j with j, then the algorithm would enter in principle an endless loop. Instead of continuing endlessly, it answers that the given code is uniquely decodable.

S_{i+1} = C^{-1}S_i\cup S_i^{-1}C, for all i\ge 1.

S_1 = C^{-1}C \setminus \{\varepsilon\}. Here, the symbol \varepsilon denotes the empty word.

The algorithm proceeds in rounds, where we maintain in each round not only one dangling suffix as described above, but the (finite) set of all potential dangling suffixes. Starting with round i=1, the set of potential dangling suffixes will be denoted by S_i. The sets S_i are defined inductively as follows:

The algorithm is described most conveniently using quotients of formal languages. In general, for two sets of strings D and N, the (left) quotient N^{-1}D is defined as the residual words obtained from D by removing some prefix in N. Formally, N^{-1}D = \{\,y \mid xy\in D ~\textrm{ and }~ x \in N \,\}. Now let C denote the (finite) set of codewords in the given code.

Precise description of the algorithm

In general, a codeword can be found by the following idea: In the first round, we choose two codewords x_1 and y_1 such that x_1 is a prefix of y_1, that is, x_1w = y_1 for some "dangling suffix" w. If one tries first x_1=011 and y_1=01110, the dangling suffix is w = 10. If we manage to find two sequences x_2,\ldots,x_p and y_2,\ldots,y_q of codewords such that x_2\cdots x_p = wy_2\cdots y_q, then we are finished: For then the string x = x_1x_2\cdots x_p can alternatively be decomposed as y_1y_2\cdots y_q, and we have found the desired string having at least two different decompositions into codewords. In the second round, we try out two different approaches: the first trial is to look for a codeword that has w as prefix. Then we obtain a new dangling suffix w', with which we can continue our search. If we eventually encounter a dangling suffix that is itself a codeword (or the empty word), then the search will terminate, as we know there exists a string with two decompositions. The second trial is to seek for a codeword that is itself a prefix of w. In our example, we have w = 10, and the sequence 1 is a codeword. We can thus also continue with w'=0 as the new dangling suffix.

Two possible decodings of this encoded string are thus given by cdb and babe.

011 – 1 – 011 – 10011.

but also as the sequence of codewords

01110 – 1110 – 011,

can be interpreted as the sequence of codewords

011101110011

Consider the code \{\, a \mapsto 1, b \mapsto 011, c\mapsto 01110, d\mapsto 1110, e\mapsto 10011\,\}. This code, which is based on an example by Berstel,[3] is an example of a code which is not uniquely decodable, since the string

Idea of the algorithm

Contents

  • Idea of the algorithm 1
  • Precise description of the algorithm 2
  • Termination and correctness of the algorithm 3
  • See also 4
  • Notes 5
  • References 6

[2], despite the fact that it was at the time already well known in coding theory.Floyd reports, the algorithm was rediscovered about ten years later in 1963 by Knuth The algorithm carries out a systematic search for a string which admits two different decompositions into codewords. As [1]

This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
 
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
 
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.
 


Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.