World Library  
Flag as Inappropriate
Email this Article
 

Hutter Prize

The Hutter Prize is a cash prize funded by Marcus Hutter which rewards data compression improvements on a specific 100 MB English text file. Specifically, the prize awards 500 euros for each one percent improvement (with 50,000 euros total funding)[1] in the compressed size of the file enwik8, which is the smaller of two files used in the Large Text Compression Benchmark; enwik8 is the first 100,000,000 characters of a specific version of English WorldHeritage.[2] The ongoing competition is organized by Hutter, Matt Mahoney, and Jim Bowery.

Contents

  • Goals 1
  • Rules 2
  • History 3
  • References 4
  • External links 5

Goals

The goal of the Hutter Prize is to encourage research in

  • Website of the Hutter Prize

External links

  1. ^ Marcus Hutter, Human Knowledge Compression Contest, http://prize.hutter1.net/
  2. ^ a b Matt Mahoney, About the Test Data http://mattmahoney.net/dc/textdata.html
  3. ^ Marcus Hutter, Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability, Springer, Berlin, 2004, http://www.hutter1.net/ai/uaibook.htm
  4. ^ Matt Mahoney, Rationale for a Large Text Compression Benchmark, 2006, http://mattmahoney.net/dc/rationale.html

References

On August 21, Alexander Ratushnyak submitted PAQ8HKCC, a modified version of PAQ8H, which improved compression by 2.6% over PAQ8F. He continued to improve the compression to 3.0% with PAQ8HP1 on August 21, 4% with PAQ8HP2 on August 28, 4.9% with PAQ8HP3 on September 3, 5.9% with PAQ8HP4 on September 10, and 5.9% with PAQ8HP5 on September 25. At that point he was awarded 3416 euros and the new baseline was set to 17,245,509 bytes. He has since improved this by 1% with PAQ8HP6 on November 6, 2% with PAQ8HP7 on December 10, and 2.3% with PAQ8HP8 on January 18, 2007. The compressed size is 16,681,045 bytes. On July 10, 2007, he once again broke his record with PAQ8HP12, achieving a size of 16,481,655 bytes, and was awarded 1732 euros. On May 23, 2009, he got a new record with decomp8, achieving a size of 15,949,688 bytes for an award of 1614 euros.

On the same day, but a few hours later Dmitry Shkarin submitted a modified version of his DURILCA compressor called DURILCA 0.5h, which improved compression by 1.5%. However it was disqualified for using 1.75 GB of memory. The decision to disqualify was controversial because the memory limits were not clearly specified in the rules at the time.

On August 16, Rudi Cilibrasi submitted a modified version of PAQ8F called RAQ8G that added parenthesis modeling. However it failed to meet the 1% threshold.

The prize was announced on August 6, 2006. The prize baseline was 18,324,887 bytes, achieved by PAQ8F.

History

Submissions must be published in order to allow independent verification. There is a 30 day waiting period for public comment before awarding a prize. The rules do not require the release of source code, unless such release is required by the code's license (as in the case of PAQ, which is licensed under GPL).

The contest is open ended. It is open to everyone. To enter, a competitor must submit a compression program and a decompressor that decompresses to the file enwik8.[2] It is also possible to submit a compressed file instead of the compression program. The total size of the compressed file and decompressor (as a Win32 or Linux executable) must not be larger than 99% of the previous prize winning entry. For each one percent improvement, the competitor wins 500 euros. The decompression program must also meet execution time and memory constraints, currently 10 hours on a 2 GHz Pentium 4 with 1 GB memory. These constraints may be relaxed in the future.

Rules

The organizers further believe that compressing natural language text is a hard AI problem, equivalent to passing the Turing test. Thus, progress toward one goal represents progress toward the other.[4] They argue that predicting which characters are most likely to occur next in a text sequence requires vast real-world knowledge. A text compressor must solve the same problem in order to assign the shortest codes to the most likely text sequences.

), which is still intractable. l(t2O, that a solution can be computed in time l and space t) where the environment is restricted to time tlAIXI is not computable. Hutter proved that in the restricted case (called Kolmogorov complexity that the optimal behavior of a goal seeking agent in an unknown but computable environment is to guess at each step that the environment is probably controlled by one of the shortest programs consistent with all interaction so far. Unfortunately, there is no general solution because [3]

This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
 
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
 
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.
 


Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.