IntroductionUp
The serialization stage of assimilation produces a file that we name a final source file. It contains data and also instructions to PanLem for the importation of the data.
The instructions and data are located on distinct lines of the file. Each line contains only one datum or one instructional item.
Example
We showed you an example of a tabular file created from a Spanish–Zapotec dictionary source.
Serialization converts a file like that to a final source file, which looks like this:
:
0
mn
dn
spa-000
astutamente
dcs2
art-303
PartOfSpeechProperty
art-303
Adverbial
dn
zpq-000
maños
mn
dn
spa-000
astuto
dcs2
art-303
PartOfSpeechProperty
art-303
Adjectival
dn
zpq-000
maños
mn
dn
spa-000
asustar
dcs2
art-303
PartOfSpeechProperty
art-303
TransitiveVerb
dn
zpq-000
chšeb
mn
dn
spa-000
asustar
dcs2
art-303
PartOfSpeechProperty
art-303
Verbal
dcs2
art-303
MorphosyntacticProperty
art-302
REFL
dn
zpq-000
chžeb
mn
df
spa-000
ataque (epiléptico)
dn
spa-000
ataque
dcs2
art-303
PartOfSpeechProperty
art-303
CommonNoun
dcs2
art-303
GenderProperty
art-303
MasculineGender
dn
zpq-000
šon
mn
dn
spa-000
atar
dcs2
art-303
PartOfSpeechProperty
art-303
TransitiveVerb
dn
zpq-000
chc̱hej
dn
zpq-000
chda’ yag
mn
df
spa-000
(estar) atado
dn
spa-000
etado
dn
zpq-000
chc̱hej
dn
zpq-000
chda’ yag
mn
dn
spa-000
atarantado
dcs2
art-303
PartOfSpeechProperty
art-303
Adjectival
dn
zpq-000
tarantadw
mn
dn
spa-000
atarantarse
dn
zpq-000
chec̱hol chenite
mn
dn
spa-000
atardecer
dcs2
art-303
PartOfSpeechProperty
art-303
IntransitiveVerb
dn
zpq-000
chex̱jw gwbiž
dn
zpq-000
chex̱jwža
mn
dn
spa-000
atascarse
dn
zpq-000
chaga’
mn
df
spa-000
atascarse (sin poder orinar o defecar)
dn
spa-000
atascarse
dn
zpq-000
cheyjw
mn
dn
spa-000
ataúd
dcs2
art-303
PartOfSpeechProperty
art-303
CommonNoun
dcs2
art-303
GenderProperty
art-303
MasculineGender
dn
zpq-000
yi’iṉ
mn
df
spa-000
atender (tomar en serio)
dn
spa-000
atender
dcs2
art-303
PartOfSpeechProperty
art-303
TransitiveVerb
dn
zpq-000
chonen c̱he
dn
zpq-000
chzi’ c̱he‣chzi’ diža’
dn
zpq-000
chejḻe’
mn
dn
spa-000
atrás
dcs2
art-303
PartOfSpeechProperty
art-303
Adverbial
dn
zpq-000
trasle
mn
dn
spa-000
atrasado
dcs2
art-303
PartOfSpeechProperty
art-303
Adjectival
dn
zpq-000
trasadw
mn
dn
spa-000
atravesar
dcs2
art-303
PartOfSpeechProperty
art-303
TransitiveVerb
dn
zpq-000
chḻaga’
dn
zpq-000
chde
mn
dn
spa-000
atreverse
dcs2
art-303
PartOfSpeechProperty
art-303
Verbal
dcs2
art-303
MorphosyntacticProperty
art-302
REFL
dn
zpq-000
cheyaxje
mn
dn
spa-000
atrevido
dcs2
art-303
PartOfSpeechProperty
art-303
Adjectival
dn
zpq-000
chogwlaz
If you compare them, you can see that the same information appears in both, except that it is more specific in the final source file. For example, the final source file makes explicit that “astutamente” is an expression and is in Spanish. A line in the tabular file is converted, typically, into a set of lines in the final source file.
Syntax
A final source file must comply with a syntax that PanLem can parse.
You can think of a final source file as containing a set of specifications for meanings (“mn”). Within each meaning specification, there are specifications for one or more meaning details. Meaning classifications, meaning properties, definitions, and denotations are all meaning details. In turn, denotations have their own denotation details. These are denotation classifications and denotation properties.
Each specification of a detail contains 3 or more lines. The first line specifies the detail type. The lines are the following:
mcs1(unary meaning classification): 1 expression specification, consisting of 1 line containing an expression’s language variety’s UID and 1 line containing the expression’s textmcs2(binary meaning classification): 2 expression specifications, each consisting of 1 line containing an expression’s language variety’s UID and 1 line containing the expression’s textmpp(meaning property): 1 expression specification (as inmcs1andmcs2) and 1 line containing a text.df(definition): 1 line containing the UID of the language variety of the definition, and 1 line containing the definition’s textdn(denotation): 1 expression specification (as inmcs1andmcs2)dcs1(unary denotation classification): same asmcs1dcs2(binary denotation classification): same asmcs2dpp(denotation property): same asmpp
The example file shown above contains blank lines. Those are permitted but not required. If you want them, you may insert them anywhere in the file except within a detail.
Leading and trailing whitespace is stripped on all lines, so it is possible to indent lines to make the logical structure more clear, as in the example above.
Final source files are text files with UTF-8 encoding. The lines all end with the line-feed (LF) character (a Unix or OS X line break, U+000A).
You should configure your source-analysis environment so that your system writes final source files with LF line breaks. If you use the PanLex tools, this should be done for you automatically on all platforms.