The specification of different aspects of Heb12, including how various front ends should interact with back ends and what each should provide
選択できるのは25トピックまでです。 トピックは、先頭が英数字で、英数字とダッシュ('-')を使用した35文字以内のものにしてください。

3.9 KiB

Formats

Heb12 supplies some formats for storing the Bible.

haplous

haplous (ἁπλοῦς) - simple

A simple, fast, and extensible format for reading the Bible.

Note: While nous (the official haplous parser) is being written, all of haplous should be considered unstable and is subject to completely change daily.

Principles

These are the main principles of the format and their priority in comparison to each other.

  1. Speed
  2. Simplicity
  3. Flexibility

Speed

The format should be optimized for quick parsing. It should be trivial to find text in one pass without having to read it to memory.

Simplicity

All configuration of the format should be obvious and minimal to enable easy parsing and human readability. Everything should have an obvious purpose even without reading the spec.

Flexibility

Since the main format is just plain text, configuration can be added in many forms while still maintaining simplicity.

Spec

Vocabulary

  • “Work” is a Bible document and related metadata
  • “Metadata” refers to the information above the document
  • “Document” is the text itself and related information

Metadata

Metadata MUST be stored at the beginning of the file, in the form of #id:value.

Each Work MUST include this information:

  • lang: the language code for the document
  • title: the title of the document
  • id: the short ID of the dodcument
  • public_domain: whether or not the document is in the public domain
  • type: the type of document (most often “bible”)

Metadata MUST NOT appear after the document.

Document

The actual text of the Work MUST be divided into books. The start of a book is shown by #book:id, and is ended when the next book is found.

Examples

#lang:en
#title:World English Bible
#id:WEB
#public_domain:true
#type:bible

#book:Gen
#chapter:1
In the beginning God created the heavens and the earth.
Now the earth was formless and empty. Darkness was on the surface of the deep. God`s Spirit was hovering over the surface of the waters.
... rest of Genesis 1
^
#chapter:2
...
^

#book:exod
#chapter:1
Now these are the names of the children of Israel, who came into Egypt; every man and his household came with Jacob.
Reuben, Simeon, Levi, and Judah,
^

Parsing strategies

Chapter

  1. Find the requested book
  2. Find the requested chapter
  3. Collect lines until ^ is found

Verses

  1. Find the requested book
  2. Find the requested chapter
  3. Find the start verse
  4. Collect lines until end verse is found

BibleC

BibleC is a tiny format to store Bible text.

It is designed to be:

  1. Extremely Minimal (One C source file, under 200 lines)
  2. Flexible, Hackable - Easy to understand how the code works

It was built with a “Make it simple and keep it simple” design philosophy.

Design

The verses are simply stored in a file seperated by newlines. This allows for mass
grammar/spelling fixes without interference with formatting characters.

A seperate data structure stored in memory (can also be loaded via index file)
is used to quickly calculate what line a verse(s) is on from reference.

It does not require complicated parsing or memory allocations, so it is very easy to port to other languages and platforms.

Comparison:

A test was done to see whether
BibleC or Haplous was faster. Each verse was grabbed
100 times, with a new instance set up each time.

Note that BibleC loaded the index file

Gen 1 1:1
haplous: ~0.000225ms
biblec:  ~0.002346ms

Exod 1 1:1
haplous: ~0.040692ms
biblec:  ~0.014129ms

Rev 1 1:1
haplous: ~0.178406ms
biblec:  ~0.759610ms

Both tests leaked no memory.
As you can see, haplous is faster with Gen 1 1:1, since
it didn’t have to parse a 4kb index file before reading the first verse.

Most of the time, BibleC is faster than Haplous. BibleC first
calculates the verse line, then seeks to it. Haplous has to parse every
line it reads.