GAN 1.0

Installation

GAN is a Python script requiring Python (>3.6), IPython, itertools, pandas (>= 1.0), xlrd (>=1.2.0).

To install the required dependencies, an option is to use the Miniconda package manager (https://docs.conda.io/en/latest/miniconda.html), and create a new environment using the bundled environment.yaml file to create a new environment to be used to run the bundled scripts.

Commands:

# Create the environment (run it once)
conda env create -f environment.yml

# Activate the environment to use the scripts
conda activate gan

Usage

The repository comes with two scripts:

gan-genus.py

usage: gan-genus.py [-h] -1 FIRST -2 SECOND [-3 THIRD] -o OUTDIR [-p PREFIX] [-c CONNECTOR] [-v]

Generate bacterial genera with Excel input

optional arguments:
  -h, --help            show this help message and exit
  -1 FIRST, --first FIRST
                        First Excel file in "GAN" format
  -2 SECOND, --second SECOND
                        Second Excel file in "GAN" format
  -3 THIRD, --third THIRD
                        Third Excel file in "GAN" format
  -o OUTDIR, --outdir OUTDIR
                        Output directory
  -p PREFIX, --prefix PREFIX
                        Output basename [default: 'gan']
  -c CONNECTOR, --connector CONNECTOR
                        String connecting the explanatory strings [default: 'of']
  -v, --verbose         Increase output verbosity

The program requires two or three Excel tables, to be supplied with the -1, -2 and -3 arguments, respectively.

The program requires an output directory to be specified (via -o), and optionally an output "basename" prefix (via -p).

The repository comes with small test files to check that the program is working properly. From the base directory of the repository:

mkdir test_output
./scripts/gan-genus.py -1 ./test/table1.xlsx -2 ./test/table2.xlsx -o ./test_output

This will produce three files in the test_output directory:

Input

Each input file is an Excel file with at least one workbook (any other workbook is discarded). An empty template is provided in input_test/template.xlsx.

It should contain these columns (in any order):

Small example:

Language Gender Part Word Root Definition Explanation
L. masc. n. admissarius admissari a stallion used for breeding horses
Gr. masc. n. Balios Balio a mythical horse horses
L. masc. n. caballus caballi a horse horses

Output

JSON format

The JSON object is an array of elements, each element is a dictionary having as key the compound name (e.g. _ Admissaristercoradaptatus_) and as value an array of tuples in the form of (type, value), where type specifies how to render the value. Some examples:

HTML format

An HTML formatted list of compound words and their etymology.

Each item is provided as:

Admissaristercoricola -- Etymology: L. masc. n. admissarius, a stallion used for breeding; L. neut. n. stercus, excrement; N.L. masc./fem. n. cola, an inhabitant; Admissaristercoricola: a microbe of the faeces of horses.

LaTeX format

A LaTeX source that can be compiled to produce a PDF document. It requires a config.tex file (supplied in the docs/ directory) and can be used to produce the PDF with this command:

pdflatex gan.tex

To install a LaTeX package, on Ubuntu (requires ~5 Gb of space):

sudo apt install texlive-full