The specification of different aspects of Heb12, including how various front ends should interact with back ends and what each should provide
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

formats.md 3.9 KiB

4 months ago
4 months ago
6 months ago
4 months ago
6 months ago
6 months ago
6 months ago
6 months ago
4 months ago
4 months ago
4 months ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147
  1. # Formats
  2. Heb12 supplies some formats for storing the Bible.
  3. ## haplous
  4. haplous (ἁπλοῦς) - simple
  5. A simple, fast, and extensible format for reading the Bible.
  6. **Note:** While nous (the official haplous parser) is being written, all of haplous should be considered unstable and is subject to completely change daily.
  7. ### Principles
  8. These are the main principles of the format and their priority in comparison to each other.
  9. 1. Speed
  10. 2. Simplicity
  11. 3. Flexibility
  12. #### Speed
  13. The format should be optimized for quick parsing. It should be trivial to find text in one pass without having to read it to memory.
  14. #### Simplicity
  15. All configuration of the format should be obvious and minimal to enable easy parsing and human readability. Everything should have an obvious purpose even without reading the spec.
  16. #### Flexibility
  17. Since the main format is just plain text, configuration can be added in many forms while still maintaining simplicity.
  18. ### Spec
  19. #### Vocabulary
  20. - "Work" is a Bible document and related metadata
  21. - "Metadata" refers to the information above the document
  22. - "Document" is the text itself and related information
  23. #### Metadata
  24. Metadata MUST be stored at the beginning of the file, in the form of `#id:value`.
  25. Each Work MUST include this information:
  26. - `lang`: the language code for the document
  27. - `title`: the title of the document
  28. - `id`: the short ID of the dodcument
  29. - `public_domain`: whether or not the document is in the public domain
  30. - `type`: the type of document (most often "bible")
  31. Metadata MUST NOT appear after the document.
  32. #### Document
  33. The actual text of the Work MUST be divided into books. The start of a book is shown by `#book:id`, and is ended when the next book is found.
  34. ### Examples
  35. ```
  36. #lang:en
  37. #title:World English Bible
  38. #id:WEB
  39. #public_domain:true
  40. #type:bible
  41. #book:Gen
  42. #chapter:1
  43. In the beginning God created the heavens and the earth.
  44. Now the earth was formless and empty. Darkness was on the surface of the deep. God`s Spirit was hovering over the surface of the waters.
  45. ... rest of Genesis 1
  46. ^
  47. #chapter:2
  48. ...
  49. ^
  50. #book:exod
  51. #chapter:1
  52. Now these are the names of the children of Israel, who came into Egypt; every man and his household came with Jacob.
  53. Reuben, Simeon, Levi, and Judah,
  54. ^
  55. ```
  56. ### Parsing strategies
  57. #### Chapter
  58. 1. Find the requested book
  59. 2. Find the requested chapter
  60. 3. Collect lines until `^` is found
  61. #### Verses
  62. 1. Find the requested book
  63. 2. Find the requested chapter
  64. 3. Find the start verse
  65. 4. Collect lines until end verse is found
  66. ## BibleC
  67. BibleC is a tiny format to store Bible text.
  68. It is designed to be:
  69. 1. Extremely Minimal (One C source file, under 200 lines)
  70. 2. Flexible, Hackable - Easy to understand how the code works
  71. It was built with a "Make it simple and keep it simple" design philosophy.
  72. ### Design
  73. The verses are simply stored in a file seperated by newlines. This allows for mass
  74. grammar/spelling fixes without interference with formatting characters.
  75. A seperate data structure stored in memory (can also be loaded via index file)
  76. is used to quickly calculate what line a verse(s) is on from reference.
  77. It does not require complicated parsing or memory allocations,
  78. so it is very easy to port to other languages and platforms.
  79. ## Comparison:
  80. A test was done to see whether
  81. BibleC or Haplous was faster. Each verse was grabbed
  82. 100 times, with a new instance set up each time.
  83. Note that BibleC loaded the index file
  84. ```
  85. Gen 1 1:1
  86. haplous: ~0.000225ms
  87. biblec: ~0.002346ms
  88. Exod 1 1:1
  89. haplous: ~0.040692ms
  90. biblec: ~0.014129ms
  91. Rev 1 1:1
  92. haplous: ~0.178406ms
  93. biblec: ~0.759610ms
  94. ```
  95. Both tests leaked no memory.
  96. As you can see, haplous is faster with `Gen 1 1:1`, since
  97. it didn't have to parse a 4kb index file before reading the first verse.
  98. Most of the time, BibleC is faster than Haplous. BibleC first
  99. calculates the verse line, then seeks to it. Haplous has to parse every
  100. line it reads.