API Documentation

pyndl.activation

pyndl.activation provides the functionality to estimate activation of a trained ndl model for given events. The trained ndl model is thereby represented as the outcome-cue weights.

pyndl.activation.activation(events, weights, *, n_jobs=1, number_of_threads=None, remove_duplicates=None, ignore_missing_cues=False)[source]

Estimate activations for given events in event file and outcome-cue weights.

Memory overhead for multiprocessing is one copy of weights plus a copy of cues for each thread.

Parameters
eventsgenerator or str

generates cues, outcomes pairs or the path to the event file

weightsxarray.DataArray or dict[dict[float]]

the xarray.DataArray needs to have the dimensions ‘outcomes’ and ‘cues’ the dictionaries hold weight[outcome][cue].

n_jobsint

a integer giving the number of threads in which the job should executed

remove_duplicates{None, True, False}

if None raise a ValueError when the same cue is present multiple times in the same event; True make cues unique per event; False keep multiple instances of the same cue (this is usually not preferred!)

ignore_missing_cues{True, False}

if True function ignores cues which are in the test dataset but not in the weight matrix if False raises a KeyError for cues which are not in the weight matrix

Returns
activationsxarray.DataArray

with dimensions ‘outcomes’ and ‘events’. Contains coords for the outcomes. returned if weights is instance of xarray.DataArray

or
activationsdict of numpy.arrays

the first dict has outcomes as keys and dicts as values the list has a activation value per event returned if weights is instance of dict

pyndl.corpus

pyndl.corpus generates a corpus file (outfile) out of a bunch of gunzipped xml subtitle files in a directory and all its subdirectories.

class pyndl.corpus.JobParseGz(break_duration)[source]

Bases: object

Stores the persistent information over several jobs and exposes a job method that only takes the varying parts as one argument.

Note

Using a closure is not possible as it is not pickable / serializable.

Methods

run

run(filename)[source]
pyndl.corpus.create_corpus_from_gz(directory, outfile, *, n_threads=1, verbose=False)[source]

Create a corpus file from a set of gunziped (.gz) files in a directory.

Parameters
directorystr

use all gz-files in this directory and all subdirectories as input.

outfilestr

name of the outfile that will be created.

n_threadsint

number of threads to use.

verbosebool
pyndl.corpus.read_clean_gzfile(gz_file_path, *, break_duration=2.0)[source]

Generator that opens and reads a gunzipped xml subtitle file, while all xml tags and timestamps are removed.

Parameters
break_durationfloat

defines the amount of time in seconds that need to pass between two subtitles in order to start a new paragraph in the resulting corpus.

Yields
linenon empty, cleaned line out of the xml subtitle file
Raises
FileNotFoundErrorif file is not there.
pyndl.correlation.correlation(semantics, activations, *, verbose=False, allow_nan=False)[source]

calculates the correlations between the semantics and the activations.

Returns
np.array (n_outcomes, n_events)
The first column contains all correlations between the first event and
all possible outcomes in the semantcs.
The first column reads like:
  1. correlation between first event and first outcome in the semantic

    (gold standard) space.

  2. correlation between first event and second outcome …

pyndl.count

pyndl.count provides functions in order to count

  • words and symbols in a corpus file

  • cues and outcomes in an event file

class pyndl.count.CuesOutcomes(n_events, cues, outcomes)

Bases: tuple

Attributes
cues

Alias for field number 1

n_events

Alias for field number 0

outcomes

Alias for field number 2

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

cues

Alias for field number 1

n_events

Alias for field number 0

outcomes

Alias for field number 2

class pyndl.count.WordsSymbols(words, symbols)

Bases: tuple

Attributes
symbols

Alias for field number 1

words

Alias for field number 0

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

symbols

Alias for field number 1

words

Alias for field number 0

pyndl.count.cues_outcomes(event_file_name, *, n_jobs=2, number_of_processes=None, verbose=False)[source]

Counts cues and outcomes in event_file_name using n_jobs processes.

Returns
(n_events, cues, outcomes)(int, collections.Counter, collections.Counter)
pyndl.count.load_counter(filename)[source]

Loads a counter out of a tab delimitered text file.

pyndl.count.save_counter(counter, filename, *, header='key\tfreq\n')[source]

Saves a counter object into a tab delimitered text file.

pyndl.count.words_symbols(corpus_file_name, *, n_jobs=2, number_of_processes=None, lower_case=False, verbose=False)[source]

Counts words and symbols in corpus_file_name using n_jobs processes.

Returns
(words, symbols)(collections.Counter, collections.Counter)

pyndl.io

pyndl.io provides functions to create event generators from different sources in order to use them with pyndl.ndl to train NDL models or to save existing events from a DataFrame or a list to a file.

pyndl.io.events_from_dataframe(df, columns=('cues', 'outcomes'))[source]

Yields events for all events in a pandas dataframe.

Parameters
dfpandas.DataFrame

a pandas DataFrame with one event per row and one colum with the cues and one column with the outcomes.

columnstuple

a tuple of column names

Yields
cues, outcomeslist, list

a tuple of two lists containing cues and outcomes

pyndl.io.events_from_file(event_path, compression='gzip', start=0, step=1)[source]

Yields events for all events in a gzipped event file.

Parameters
event_pathstr

path to gzipped event file

compressionstr

indicates whether the events should be read from gunzip file or not can be {“gzip” or None}

start: int

first event to read

step: int

slice every step-th event (useful for parallel computations)

Yields
cues, outcomeslist, list

a tuple of two lists containing cues and outcomes

pyndl.io.events_from_list(lst)[source]

Yields events for all events in a list.

Parameters
lstlist of list of str or list of str

a list either containing a list of cues as strings and a list of outcomes as strings or a list containing a cue and an outcome string, where cues respectively outcomes are seperated by an undescore

Yields
cues, outcomeslist, list

a tuple of two lists containing cues and outcomes

pyndl.io.events_to_file(events, file_path, delimiter='\t', compression='gzip', columns=('cues', 'outcomes'), compatible=False)[source]

Writes events to a file

Parameters
eventspandas.DataFrame or Iterator or Iterable

a pandas DataFrame with one event per row and one colum with the cues and one column with the outcomes or a list of cues and outcomes as strings or a list of a list of cues and a list of outcomes which should be written to a file

file_path: str

path to where the file should be saved

delimiter: str

Seperator which should be used. Default ist a tab

compressionstr

indicates whether the events should be read from gunzip file or not can be {“gzip” or None}

columns: tuple

a tuple of column names

compatible: bool

if true add a third frequency column (all ones) for compatibility with ndl2

pyndl.io.safe_write_path(path, template='{path.stem}-{counter}{path.suffix}')[source]

Create a file path to avoid overwriting existing files. Returns the original path if it does not exist or an incremented version according to the template.

This function with the default template creates filenames like pathname/example.png, pathname/example-1.png, pathname/example-2.png, …

Parameters
path: file path
template: format string syntax of incremented file name.

available variables are counter (int) and path (pathlib.Path).

Returns
path: the input path or (if file exists) the path with incremented filename.

pyndl.ndl

pyndl.ndl provides functions in order to train NDL models

class pyndl.ndl.WeightDict(*args, **kwargs)[source]

Bases: defaultdict

Subclass of defaultdict to represent outcome-cue weights.

Notes

Weight for each outcome-cue combination is 0 per default.

Attributes
attrs
default_factory

Factory for default value called by __missing__().

Methods

clear()

copy()

fromkeys(iterable[, value])

Create a new dictionary with keys from iterable and values set to value.

get(key[, default])

Return the value for key if key is in the dictionary, else default.

items()

keys()

pop(key[, default])

If key is not found, default is returned if given, otherwise KeyError is raised

popitem(/)

Remove and return a (key, value) pair as a 2-tuple.

setdefault(key[, default])

Insert key with a value of default if key is not in the dictionary.

update([E, ]**F)

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values()

property attrs
pyndl.ndl.data_array(weights, *, attrs=None)[source]

Calculate the weights for all_outcomes over all events in event_file.

Parameters
weightsdict of dicts of floats or WeightDict

the first dict has outcomes as keys and dicts as values the second dict has cues as keys and weights as values weights[outcome][cue] gives the weight between outcome and cue. If a dict of dicts is given, attrs is required. If a WeightDict is given, attrs is optional

attrsdict

A dictionary of attributes

Returns
weightsxarray.DataArray

with dimensions ‘outcomes’ and ‘cues’. You can lookup the weights between a cue and an outcome with weights.loc[{'outcomes': outcome, 'cues': cue}] or weights.loc[outcome].loc[cue].

pyndl.ndl.dict_ndl(events, alphas, betas, lambda_=1.0, *, weights=None, inplace=False, remove_duplicates=None, make_data_array=False, verbose=False)[source]

Calculate the weights for all_outcomes over all events in event_file.

This is a pure python implementation using dicts.

Parameters
eventsgenerator or str

generates cues, outcomes pairs or the path to the event file

alphasdict or float

a (default)dict having cues as keys and a value below 1 as value

betas(float, float)

one value for successful prediction (reward) one for punishment

lambda_float
weightsdict of dicts or xarray.DataArray or None

initial weights

inplace: {True, False}

if True calculates the weightmatrix inplace if False creates a new weightmatrix to learn on

remove_duplicates{None, True, False}

if None though a ValueError when the same cue is present multiple times in the same event; True make cues and outcomes unique per event; False keep multiple instances of the same cue or outcome (this is usually not preferred!)

make_data_array{False, True}

if True makes a xarray.DataArray out of the dict of dicts.

verbosebool

print some output if True.

Returns
weightsdict of dicts of floats

the first dict has outcomes as keys and dicts as values the second dict has cues as keys and weights as values weights[outcome][cue] gives the weight between outcome and cue.

or
weightsxarray.DataArray

with dimensions ‘outcomes’ and ‘cues’. You can lookup the weights between a cue and an outcome with weights.loc[{'outcomes': outcome, 'cues': cue}] or weights.loc[outcome].loc[cue].

Notes

The metadata will only be stored when make_data_array is True and then dict_ndl cannot be used to continue learning. At the moment there is no proper way to automatically store the meta data into the default dict.

pyndl.ndl.ndl(events, alpha, betas, lambda_=1.0, *, method='openmp', weights=None, number_of_threads=None, n_jobs=8, len_sublists=None, n_outcomes_per_job=10, remove_duplicates=None, verbose=False, temporary_directory=None, events_per_temporary_file=10000000)[source]

Calculate the weights for all_outcomes over all events in event_file given by the files path.

This is a parallel python implementation using numpy, multithreading and the binary format defined in preprocess.py.

Parameters
eventsgenerator or str

generates cues, outcomes pairs or the path to the event file

alphafloat

saliency of all cues

betas(float, float)

one value for successful prediction (reward) one for punishment

lambda_float
method{‘openmp’, ‘threading’}
weightsNone or xarray.DataArray

the xarray.DataArray needs to have the named dimensions ‘cues’ and ‘outcomes’

n_jobsint

a integer giving the number of threads in which the job should executed

n_outcomes_per_jobint

a integer giving the length of sublists generated from all outcomes

remove_duplicates{None, True, False}

if None though a ValueError when the same cue is present multiple times in the same event; True make cues and outcomes unique per event; False keep multiple instances of the same cue or outcome (this is usually not preferred!)

verbosebool

print some output if True.

temporary_directorystr

path to directory to use for storing temporary files created; if none is provided, the operating system’s default will be used (/tmp on unix)

events_per_temporary_file: int

Number of events in each temporary binary file. Has to be larger than 1

Returns
weightsxarray.DataArray

with dimensions ‘outcomes’ and ‘cues’. You can lookup the weights between a cue and an outcome with weights.loc[{'outcomes': outcome, 'cues': cue}] or weights.loc[outcome].loc[cue].

pyndl.ndl.slice_list(list_, len_sublists)[source]

Slices a list in sublists with the length len_sublists.

Parameters
list_list

list which should be sliced in sublists

len_sublistsint

integer which determines the length of the sublists

Returns
seq_listlist of lists

a list of sublists with the length len_sublists

pyndl.preprocess

pyndl.preprocess provides functions in order to preprocess data and create event files from it.

class pyndl.preprocess.JobFilter(keep_cues, keep_outcomes, remove_cues, remove_outcomes, cue_map, outcome_map)[source]

Bases: object

Stores the persistent information over several jobs and exposes a job method that only takes the varying parts as one argument.

Note

Using a closure is not possible as it is not pickable / serializable.

Methods

job

process_cues

process_cues_all

process_cues_keep

process_cues_map

process_cues_remove

process_outcomes

process_outcomes_all

process_outcomes_keep

process_outcomes_map

process_outcomes_remove

return_empty_string

job(line)[source]
process_cues(cues)[source]
process_cues_all(cues)[source]
process_cues_keep(cues)[source]
process_cues_map(cues)[source]
process_cues_remove(cues)[source]
process_outcomes(outcomes)[source]
process_outcomes_all(outcomes)[source]
process_outcomes_keep(outcomes)[source]
process_outcomes_map(outcomes)[source]
process_outcomes_remove(outcomes)[source]
static return_empty_string()[source]
pyndl.preprocess.bandsample(population, sample_size=50000, *, cutoff=5, seed=None, verbose=False)[source]

Creates a sample of size sample_size out of the population using band sampling.

pyndl.preprocess.create_binary_event_files(event_file, path_name, cue_id_map, outcome_id_map, *, sort_within_event=False, n_jobs=2, events_per_file=10000000, overwrite=False, remove_duplicates=None, verbose=False)[source]

Creates the binary event files for a tabular cue outcome corpus.

Parameters
event_filestr

path to tab separated text file that contains all events in a cue outcome table.

path_namestr

folder name where to store the binary event files

cue_id_mapdict (str -> int)

cue to id map

outcome_id_mapdict (str -> int)

outcome to id map

sort_within_eventbool

should we sort the cues and outcomes within the event

n_jobsint

number of threads to use

events_per_fileint

Number of events in each binary file. Has to be larger than 1

overwritebool

overwrite files if they exist

remove_duplicates{None, True, False}

if None though a ValueError when the same cue is present multiple times in the same event; True make cues and outcomes unique per event; False keep multiple instances of the same cue or outcome (this is usually not preferred!)

verbosebool
Returns
number_eventsint

sum of number of events written to binary files

pyndl.preprocess.create_event_file(corpus_file, event_file, *, allowed_symbols='*', context_structure='document', event_structure='consecutive_words', event_options=(3,), cue_structure='trigrams_to_word', lower_case=False, remove_duplicates=True, verbose=False)[source]

Create an text based event file from a corpus file.

Warning

‘_’, ‘#’, and ‘ ‘ are removed from the input of the corpus file and replaced by a ‘ ‘, which is treated as a word boundary.

Parameters
corpus_filestr

path where the corpus file is

event_filestr

path where the output file will be created

allowed_symbolsstr, function

all allowed symbols to include in the events as a set of characters. The set of characters might be explicit or contains Regex character sets.

‘_’, ‘#’, and TAB are special symbols in the event file and will be removed automatically. If the corpus file contains these special symbols a warning will be given.

These examples define the same allowed symbols:

'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
'a-zA-Z'
'*'

or a function indicating which characters to include. The function should return True, if the passed character is a allowed symbol.

For example:

lambda chr: chr in "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
lambda chr: ('a' <= chr <= 'z') or ('A' <= chr <= 'Z')
context_structure{“document”, “paragraph”, “line”}
event_structure{“line”, “consecutive_words”, “word_to_word”, “sentence”}
event_optionsNone or (number_of_words,) or (before, after) or None

in “consecutive words” the number of words of the sliding window as an integer; in “word_to_word” the number of words before and after the word of interest each as an integer.

cue_structure: {“trigrams_to_word”, “word_to_word”, “bigrams_to_word”}
lower_casebool

should the cues and outcomes be lower cased

remove_duplicatesbool

create unique cues and outcomes per event

verbosebool

Notes

Breaks / Separators :

What marks parts, where we do not want to continue learning?

  • ---end.of.document--- string?

  • line breaks?

  • empty lines?

What do we consider one event?

  • three consecutive words?

  • one line of the corpus?

  • everything between two empty lines?

  • everything within one document?

Should the events be connected to the events before and after?

No.

Context :

A context is a whole document or a paragraph within which we will take (three) consecutive words as occurrences or events. The last words of a context will not form an occurrence with the first words of the next context.

Occurrence :

An occurrence or event is will result in one event in the end. This can be (three) consecutive words, a sentence, or a line in the corpus file.

pyndl.preprocess.event_generator(event_file, cue_id_map, outcome_id_map, *, sort_within_event=False)[source]
pyndl.preprocess.filter_event_file(input_event_file, output_event_file, *, keep_cues='all', keep_outcomes='all', remove_cues=None, remove_outcomes=None, cue_map=None, outcome_map=None, n_jobs=1, number_of_processes=None, chunksize=100000, verbose=False)[source]

Filter an event file by a list or a map of cues and outcomes.

Parameters
You can either use keep_*, remove_*, or map_*.
input_event_filestr

path where the input event file is

output_event_filestr

path where the output file will be created

keep_cues“all” or sequence of str

list of all cues that should be kept

keep_outcomes“all” or sequence of str

list of all outcomes that should be kept

remove_cuesNone or sequence of str

list of all cues that should be removed

remove_outcomesNone or sequence of str

list of all outcomes that should be removed

cue_mapdict

maps every cue as key to the value. Removes all cues that do not have a key. This can be used to map several different cues to the same cue or to rename cues.

outcome_mapdict

maps every outcome as key to the value. Removes all outcome that do not have a key. This can be used to map several different outcomes to the same outcome or to rename outcomes.

n_jobsint

number of threads to use

chunksizeint

number of chunks per submitted job, should be around 100000

Notes

It will keep all cues that are within the event and that (for a human reader) might clearly belong to a removed outcome. This is on purpose and is the expected behaviour as these cues are in the context of this outcome.

If an event has no cues it gets removed, but if an event has no outcomes it is still present in order to capture the background rate of that cues.

pyndl.preprocess.ngrams_to_word(occurrences, n_chars, outfile, remove_duplicates=True)[source]

Process the occurrences and write them to outfile.

Parameters
occurrencessequence of (cues, outcomes) tuples

cues and outcomes are both strings where underscores and # are special symbols.

n_charsnumber of characters (e.g. 2 for bigrams, 3 for trigrams, …)
outfilefile handle
remove_duplicatesbool

if True make cues and outcomes per event unique

pyndl.preprocess.process_occurrences(occurrences, outfile, *, cue_structure='trigrams_to_word', remove_duplicates=True)[source]

Process the occurrences and write them to outfile.

Parameters
occurrencessequence of (cues, outcomes) tuples

cues and outcomes are both strings where underscores and # are special symbols.

outfilefile handle
cue_structure{‘bigrams_to_word’, ‘trigrams_to_word’, ‘word_to_word’}
remove_duplicatesbool

if True make cues and outcomes per event unique

pyndl.preprocess.read_binary_file(binary_file_path)[source]
pyndl.preprocess.to_bytes(int_)[source]
pyndl.preprocess.to_integer(byte_)[source]
pyndl.preprocess.write_events(events, filename, *, start=0, stop=4294967295, remove_duplicates=None)[source]

Write out a list of events to a disk file in binary format.

Parameters
eventsiterator of (cue_ids, outcome_ids) tuples called event
filenamestring
startfirst event to write (zero based index)
stoplast event to write (zero based index; excluded)
remove_duplicates{None, True, False}

if None though a ValueError when the same cue is present multiple times in the same event; True make cues and outcomes unique per event; False keep multiple instances of the same cue or outcome (this is usually not preferred!)

Returns
number_eventsint

actual number of events written to file

Raises
StopIterationevents generator is exhausted before stop is reached

Notes

The binary format as the following structure:

8 byte header
nr of events
nr of cues in first event
ids for every cue
nr of outcomes in first event
ids for every outcome
nr of cues in second event
...

pyndl.wh

pyndl.wh provides functions in order to train Widrow-Hoff (WH) models. In contrast to the Rescorla-Wagner (RW) models, the WH models can not only have binary cues and outcomes, but can encode gradual intensities in the cues and outcomes. This is done by associating a vector of continues values (real numbers) to each cue and outcome. The size of the vector has to be the same for all cues and for all outcomes, but can differ between cues and outcomes.

It is possible to calculate weights for continuous cues or continues outcomes, while keeping the outcomes respectively cues binary. Finally, it is possible to have both sides, cues and outcomes, to be continues and calculate the Widrow-Hoff learning rule between them.

pyndl.wh.dict_wh(events, eta, cue_vectors, outcome_vectors, *, weights=None, inplace=False, remove_duplicates=None, make_data_array=False, verbose=False)[source]

Calculate the weights for all_outcomes over all events in events.

This is a pure python implementation using dicts.

Parameters
eventsgenerator or str

generates cues, outcomes pairs or the path to the event file

etafloat

learning rate

cue_vectorsxarray.DataArray

matrix that contains the cue vectors for each cue

outcome_vectorsxarray.DataArray

matrix that contains the target vectors for each outcome

weightsdict of dicts or xarray.DataArray or None

initial weights

inplace: {True, False}

if True calculates the weightmatrix inplace if False creates a new weightmatrix to learn on

remove_duplicates{None, True, False}

if None though a ValueError when the same cue is present multiple times in the same event; True make cues and outcomes unique per event; False keep multiple instances of the same cue or outcome (this is usually not preferred!)

make_data_array{False, True}

if True makes a xarray.DataArray out of the dict of dicts.

verbosebool

print some output if True.

Returns
weightsdict of dicts of floats

the first dict has outcomes as keys and dicts as values the second dict has cues as keys and weights as values weights[outcome][cue] gives the weight between outcome and cue.

or
weightsxarray.DataArray

with dimensions ‘outcome_vector_dimensions’ and ‘cue_vector_dimensions’. You can lookup the weights between a cue dimension and an outcome dimension with weights.loc[{'outcome_vector_dimensions': outcome_vector_dimension, 'cue_vector_dimensions': cue_vector_dimension}] or weights.loc[outcome_vector_dimension].loc[cue_vector_dimension].

Notes

The metadata will only be stored when make_data_array is True and then dict_ndl cannot be used to continue learning. At the moment there is no proper way to automatically store the meta data into the default dict.

Furthermore, this implementation only supports the ‘real to real’ case where cue vectors are learned on outcome vectors. For the ‘binary to real’ or ‘real to binary’ cases the wh.wh function needs to be used which uses a fast cython implementation.

The main purpose of this function is to have a reference implementation which is used to validate the faster cython version against. Additionally, this function can be a good starting point to develop different flavors of the Widrow-Hoff learning rule.

pyndl.wh.wh(events, eta, *, cue_vectors=None, outcome_vectors=None, method='openmp', weights=None, n_jobs=8, n_outcomes_per_job=10, remove_duplicates=None, verbose=False, temporary_directory=None, events_per_temporary_file=10000000)[source]

Calculate the weights for all events using the Widrow-Hoff learning rule in three different flavors.

In the first flavor, cues and outcomes both are vectors and the names in the eventfiles refer to these vectors. The vectors for all cues and outcomes are given as an xarray.DataArray with the arguments cue_vectors and `outcome_vectors’.

In the second and third flavor, only the cues or only the outcomes are treated as vectors and the ones not being treated as vectors are still considered being present or not being present in a binary way.

This is a parallel python implementation using cython, numpy, multithreading and the binary format defined in preprocess.py.

Parameters
eventsstr

path to the event file

etafloat

learning rate

cue_vectorsxarray.DataArray

matrix that contains the cue vectors for each cue

outcome_vectorsxarray.DataArray

matrix that contains the target vectors for each outcome

method{‘openmp’, ‘threading’, ‘numpy’}

‘numpy’ works only for real to real Widrow-Hoff.

weightsNone or xarray.DataArray

the xarray.DataArray needs to have the named dimensions ‘cues’ or ‘cue_vector_dimensions’ and ‘outcomes’ or ‘outcome_vector_dimensions’

n_jobsint

an integer giving the number of threads in which the job should be executed

n_outcomes_per_jobint

an integer giving the number of outcomes that are processed in one job

remove_duplicates{None, True, False}

if None raise a ValueError when the same cue is present multiple times in the same event; True make cues and outcomes unique per event; False keep multiple instances of the same cue or outcome (this is usually not preferred!)

verbosebool

print some output if True

temporary_directorystr

path to directory to use for storing temporary files created; if none is provided, the operating system’s default will be used like ‘/tmp’ on unix

events_per_temporary_file: int

Number of events in each temporary binary file. Has to be larger than 1

Returns
weightsxarray.DataArray

the dimensions of the weights reflect the type of Widrow-Hoff that was run (real to real, binary to real, real to binary or binary to binary). The dimension names reflect this in the weights. They are a combination of ‘outcomes’ x ‘outcome_vector_dimensions’ and ‘cues’ x ‘cue_vector_dimensions’ with dimensions ‘outcome_vector dimensions’ and ‘cue_vector_dimensions’. You can lookup the weights between a vector dimension and a cue with weights.loc[{'outcome_vector_dimensions': outcome_vector_dimension, 'cue_vector_dimensions': cue_vector_dimension}] or weights.loc[vector_dimension].loc[cue_vector_dimension].