API Documentation

pyndl.activation

pyndl.activation provides the functionality to estimate activation of a trained ndl model for given events. The trained ndl model is thereby represented as the outcome-cue weights.

pyndl.activation.activation(events, weights, *, n_jobs=1, number_of_threads=None, remove_duplicates=None, ignore_missing_cues=False)[source]

Estimate activations for given events in event file and outcome-cue weights.

Memory overhead for multiprocessing is one copy of weights plus a copy of cues for each thread.

Parameters

eventsgenerator or str: generates cues, outcomes pairs or the path to the event file
weightsxarray.DataArray or dict[dict[float]]: the xarray.DataArray needs to have the dimensions ‘outcomes’ and ‘cues’ the dictionaries hold weight[outcome][cue].
n_jobsint: a integer giving the number of threads in which the job should executed
remove_duplicates{None, True, False}: if None raise a ValueError when the same cue is present multiple times in the same event; True make cues unique per event; False keep multiple instances of the same cue (this is usually not preferred!)
ignore_missing_cues{True, False}: if True function ignores cues which are in the test dataset but not in the weight matrix if False raises a KeyError for cues which are not in the weight matrix

Returns

activationsxarray.DataArray: with dimensions ‘outcomes’ and ‘events’. Contains coords for the outcomes. returned if weights is instance of xarray.DataArray
or
activationsdict of numpy.arrays: the first dict has outcomes as keys and dicts as values the list has a activation value per event returned if weights is instance of dict

pyndl.corpus

pyndl.corpus generates a corpus file (outfile) out of a bunch of gunzipped xml subtitle files in a directory and all its subdirectories.

class pyndl.corpus.JobParseGz(break_duration)[source]

Bases: object

Stores the persistent information over several jobs and exposes a job method that only takes the varying parts as one argument.

Note

Using a closure is not possible as it is not pickable / serializable.

Methods

run

run(filename)[source]

pyndl.corpus.create_corpus_from_gz(directory, outfile, *, n_threads=1, verbose=False)[source]

Create a corpus file from a set of gunziped (.gz) files in a directory.

Parameters

directorystr: use all gz-files in this directory and all subdirectories as input.
outfilestr: name of the outfile that will be created.
n_threadsint: number of threads to use.
verbosebool

pyndl.corpus.read_clean_gzfile(gz_file_path, *, break_duration=2.0)[source]

Generator that opens and reads a gunzipped xml subtitle file, while all xml tags and timestamps are removed.

Parameters

break_durationfloat: defines the amount of time in seconds that need to pass between two subtitles in order to start a new paragraph in the resulting corpus.

Yields

linenon empty, cleaned line out of the xml subtitle file

Raises

FileNotFoundErrorif file is not there.

pyndl.correlation.correlation(semantics, activations, *, verbose=False, allow_nan=False)[source]

calculates the correlations between the semantics and the activations.

Returns

np.array (n_outcomes, n_events)
The first column contains all correlations between the first event and
all possible outcomes in the semantcs.
The first column reads like:

correlation between first event and first outcome in the semantic
(gold standard) space.
correlation between first event and second outcome …

…

pyndl.count

pyndl.count provides functions in order to count

words and symbols in a corpus file
cues and outcomes in an event file

class pyndl.count.CuesOutcomes(n_events, cues, outcomes)

Bases: tuple

Attributes

cues: Alias for field number 1
n_events: Alias for field number 0
outcomes: Alias for field number 2

Methods

`count`(value, /)	Return number of occurrences of value.
`index`(value[, start, stop])	Return first index of value.

cues: Alias for field number 1

n_events: Alias for field number 0

outcomes: Alias for field number 2

class pyndl.count.WordsSymbols(words, symbols)

Bases: tuple

Attributes

symbols: Alias for field number 1
words: Alias for field number 0

Methods

`count`(value, /)	Return number of occurrences of value.
`index`(value[, start, stop])	Return first index of value.

symbols: Alias for field number 1

words: Alias for field number 0

pyndl.count.cues_outcomes(event_file_name, *, n_jobs=2, number_of_processes=None, verbose=False)[source]

Counts cues and outcomes in event_file_name using n_jobs processes.

Returns

(n_events, cues, outcomes)(int, collections.Counter, collections.Counter)

pyndl.count.load_counter(filename)[source]: Loads a counter out of a tab delimitered text file.

pyndl.count.save_counter(counter, filename, *, header='key\tfreq\n')[source]: Saves a counter object into a tab delimitered text file.

pyndl.count.words_symbols(corpus_file_name, *, n_jobs=2, number_of_processes=None, lower_case=False, verbose=False)[source]

Counts words and symbols in corpus_file_name using n_jobs processes.

Returns

(words, symbols)(collections.Counter, collections.Counter)

pyndl.io

pyndl.io provides functions to create event generators from different sources in order to use them with pyndl.ndl to train NDL models or to save existing events from a DataFrame or a list to a file.

pyndl.io.events_from_dataframe(df, columns=('cues', 'outcomes'))[source]

Yields events for all events in a pandas dataframe.

Parameters

dfpandas.DataFrame: a pandas DataFrame with one event per row and one colum with the cues and one column with the outcomes.
columnstuple: a tuple of column names

Yields

cues, outcomeslist, list: a tuple of two lists containing cues and outcomes

pyndl.io.events_from_file(event_path, compression='gzip', start=0, step=1)[source]

Yields events for all events in a gzipped event file.

Parameters

event_pathstr: path to gzipped event file
compressionstr: indicates whether the events should be read from gunzip file or not can be {“gzip” or None}
start: int: first event to read
step: int: slice every step-th event (useful for parallel computations)

Yields

cues, outcomeslist, list: a tuple of two lists containing cues and outcomes

pyndl.io.events_from_list(lst)[source]

Yields events for all events in a list.

Parameters

lstlist of list of str or list of str: a list either containing a list of cues as strings and a list of outcomes as strings or a list containing a cue and an outcome string, where cues respectively outcomes are seperated by an undescore

Yields

cues, outcomeslist, list: a tuple of two lists containing cues and outcomes

pyndl.io.events_to_file(events, file_path, delimiter='\t', compression='gzip', columns=('cues', 'outcomes'), compatible=False)[source]

Writes events to a file

Parameters

eventspandas.DataFrame or Iterator or Iterable: a pandas DataFrame with one event per row and one colum with the cues and one column with the outcomes or a list of cues and outcomes as strings or a list of a list of cues and a list of outcomes which should be written to a file
file_path: str: path to where the file should be saved
delimiter: str: Seperator which should be used. Default ist a tab
compressionstr: indicates whether the events should be read from gunzip file or not can be {“gzip” or None}
columns: tuple: a tuple of column names
compatible: bool: if true add a third frequency column (all ones) for compatibility with ndl2

pyndl.io.safe_write_path(path, template='{path.stem}-{counter}{path.suffix}')[source]

Create a file path to avoid overwriting existing files. Returns the original path if it does not exist or an incremented version according to the template.

This function with the default template creates filenames like pathname/example.png, pathname/example-1.png, pathname/example-2.png, …

Parameters

path: file path
template: format string syntax of incremented file name.: available variables are counter (int) and path (pathlib.Path).

Returns

path: the input path or (if file exists) the path with incremented filename.

pyndl.ndl

pyndl.ndl provides functions in order to train NDL models

class pyndl.ndl.WeightDict(*args, **kwargs)[source]

Bases: defaultdict

Subclass of defaultdict to represent outcome-cue weights.

Notes

Weight for each outcome-cue combination is 0 per default.

Attributes

attrs
default_factory: Factory for default value called by __missing__().

Methods

`clear`()
`copy`()
`fromkeys`(iterable[, value])	Create a new dictionary with keys from iterable and values set to value.
`get`(key[, default])	Return the value for key if key is in the dictionary, else default.
`items`()
`keys`()
`pop`(key[, default])	If key is not found, default is returned if given, otherwise KeyError is raised
`popitem`(/)	Remove and return a (key, value) pair as a 2-tuple.
`setdefault`(key[, default])	Insert key with a value of default if key is not in the dictionary.
`update`([E, ]**F)	If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
`values`()

property attrs

pyndl.ndl.data_array(weights, *, attrs=None)[source]

Calculate the weights for all_outcomes over all events in event_file.

Parameters

weightsdict of dicts of floats or WeightDict: the first dict has outcomes as keys and dicts as values the second dict has cues as keys and weights as values weights[outcome][cue] gives the weight between outcome and cue. If a dict of dicts is given, attrs is required. If a WeightDict is given, attrs is optional
attrsdict: A dictionary of attributes

Returns

weightsxarray.DataArray: with dimensions ‘outcomes’ and ‘cues’. You can lookup the weights between a cue and an outcome with weights.loc[{'outcomes': outcome, 'cues': cue}] or weights.loc[outcome].loc[cue].

pyndl.ndl.dict_ndl(events, alphas, betas, lambda_=1.0, *, weights=None, inplace=False, remove_duplicates=None, make_data_array=False, verbose=False)[source]

Calculate the weights for all_outcomes over all events in event_file.

This is a pure python implementation using dicts.

Parameters

eventsgenerator or str: generates cues, outcomes pairs or the path to the event file
alphasdict or float: a (default)dict having cues as keys and a value below 1 as value
betas(float, float): one value for successful prediction (reward) one for punishment
lambda_float
weightsdict of dicts or xarray.DataArray or None: initial weights
inplace: {True, False}: if True calculates the weightmatrix inplace if False creates a new weightmatrix to learn on
remove_duplicates{None, True, False}: if None though a ValueError when the same cue is present multiple times in the same event; True make cues and outcomes unique per event; False keep multiple instances of the same cue or outcome (this is usually not preferred!)
make_data_array{False, True}: if True makes a xarray.DataArray out of the dict of dicts.
verbosebool: print some output if True.

Returns

weightsdict of dicts of floats: the first dict has outcomes as keys and dicts as values the second dict has cues as keys and weights as values weights[outcome][cue] gives the weight between outcome and cue.
or
weightsxarray.DataArray: with dimensions ‘outcomes’ and ‘cues’. You can lookup the weights between a cue and an outcome with weights.loc[{'outcomes': outcome, 'cues': cue}] or weights.loc[outcome].loc[cue].

Notes

The metadata will only be stored when make_data_array is True and then dict_ndl cannot be used to continue learning. At the moment there is no proper way to automatically store the meta data into the default dict.

pyndl.ndl.ndl(events, alpha, betas, lambda_=1.0, *, method='openmp', weights=None, number_of_threads=None, n_jobs=8, len_sublists=None, n_outcomes_per_job=10, remove_duplicates=None, verbose=False, temporary_directory=None, events_per_temporary_file=10000000)[source]

Calculate the weights for all_outcomes over all events in event_file given by the files path.

This is a parallel python implementation using numpy, multithreading and the binary format defined in preprocess.py.

Parameters

eventsgenerator or str: generates cues, outcomes pairs or the path to the event file
alphafloat: saliency of all cues
betas(float, float): one value for successful prediction (reward) one for punishment
lambda_float
method{‘openmp’, ‘threading’}
weightsNone or xarray.DataArray: the xarray.DataArray needs to have the named dimensions ‘cues’ and ‘outcomes’
n_jobsint: a integer giving the number of threads in which the job should executed
n_outcomes_per_jobint: a integer giving the length of sublists generated from all outcomes
remove_duplicates{None, True, False}: if None though a ValueError when the same cue is present multiple times in the same event; True make cues and outcomes unique per event; False keep multiple instances of the same cue or outcome (this is usually not preferred!)
verbosebool: print some output if True.
temporary_directorystr: path to directory to use for storing temporary files created; if none is provided, the operating system’s default will be used (/tmp on unix)
events_per_temporary_file: int: Number of events in each temporary binary file. Has to be larger than 1

Returns

weightsxarray.DataArray: with dimensions ‘outcomes’ and ‘cues’. You can lookup the weights between a cue and an outcome with weights.loc[{'outcomes': outcome, 'cues': cue}] or weights.loc[outcome].loc[cue].

pyndl.ndl.slice_list(list_, len_sublists)[source]

Slices a list in sublists with the length len_sublists.

Parameters

list_list: list which should be sliced in sublists
len_sublistsint: integer which determines the length of the sublists

Returns

seq_listlist of lists: a list of sublists with the length len_sublists

pyndl.preprocess

pyndl.preprocess provides functions in order to preprocess data and create event files from it.

class pyndl.preprocess.JobFilter(keep_cues, keep_outcomes, remove_cues, remove_outcomes, cue_map, outcome_map)[source]

Bases: object

Stores the persistent information over several jobs and exposes a job method that only takes the varying parts as one argument.

Note

Using a closure is not possible as it is not pickable / serializable.

Methods

job
process_cues
process_cues_all
process_cues_keep
process_cues_map
process_cues_remove
process_outcomes
process_outcomes_all
process_outcomes_keep
process_outcomes_map
process_outcomes_remove
return_empty_string

job(line)[source]

process_cues(cues)[source]

process_cues_all(cues)[source]

process_cues_keep(cues)[source]

process_cues_map(cues)[source]

process_cues_remove(cues)[source]

process_outcomes(outcomes)[source]

process_outcomes_all(outcomes)[source]

process_outcomes_keep(outcomes)[source]

process_outcomes_map(outcomes)[source]

process_outcomes_remove(outcomes)[source]

static return_empty_string()[source]

pyndl.preprocess.bandsample(population, sample_size=50000, *, cutoff=5, seed=None, verbose=False)[source]: Creates a sample of size sample_size out of the population using band sampling.

pyndl.preprocess.create_binary_event_files(event_file, path_name, cue_id_map, outcome_id_map, *, sort_within_event=False, n_jobs=2, events_per_file=10000000, overwrite=False, remove_duplicates=None, verbose=False)[source]

Creates the binary event files for a tabular cue outcome corpus.

Parameters

event_filestr: path to tab separated text file that contains all events in a cue outcome table.
path_namestr: folder name where to store the binary event files
cue_id_mapdict (str -> int): cue to id map
outcome_id_mapdict (str -> int): outcome to id map
sort_within_eventbool: should we sort the cues and outcomes within the event
n_jobsint: number of threads to use
events_per_fileint: Number of events in each binary file. Has to be larger than 1
overwritebool: overwrite files if they exist
remove_duplicates{None, True, False}: if None though a ValueError when the same cue is present multiple times in the same event; True make cues and outcomes unique per event; False keep multiple instances of the same cue or outcome (this is usually not preferred!)
verbosebool

Returns

number_eventsint: sum of number of events written to binary files

pyndl.preprocess.create_event_file(corpus_file, event_file, *, allowed_symbols='*', context_structure='document', event_structure='consecutive_words', event_options=(3,), cue_structure='trigrams_to_word', lower_case=False, remove_duplicates=True, verbose=False)[source]

Create an text based event file from a corpus file.

Warning

‘_’, ‘#’, and ‘ ‘ are removed from the input of the corpus file and replaced by a ‘ ‘, which is treated as a word boundary.

Parameters

corpus_filestr

path where the corpus file is

event_filestr

path where the output file will be created

allowed_symbolsstr, function

all allowed symbols to include in the events as a set of characters. The set of characters might be explicit or contains Regex character sets.

‘_’, ‘#’, and TAB are special symbols in the event file and will be removed automatically. If the corpus file contains these special symbols a warning will be given.

These examples define the same allowed symbols:

'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
'a-zA-Z'
'*'

or a function indicating which characters to include. The function should return True, if the passed character is a allowed symbol.

For example:

lambda chr: chr in "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
lambda chr: ('a' <= chr <= 'z') or ('A' <= chr <= 'Z')

context_structure{“document”, “paragraph”, “line”}

event_structure{“line”, “consecutive_words”, “word_to_word”, “sentence”}

event_optionsNone or (number_of_words,) or (before, after) or None

in “consecutive words” the number of words of the sliding window as an integer; in “word_to_word” the number of words before and after the word of interest each as an integer.

cue_structure: {“trigrams_to_word”, “word_to_word”, “bigrams_to_word”}

lower_casebool

should the cues and outcomes be lower cased

remove_duplicatesbool

create unique cues and outcomes per event

verbosebool

Notes

Breaks / Separators :

What marks parts, where we do not want to continue learning?

---end.of.document--- string?

line breaks?

empty lines?

What do we consider one event?

three consecutive words?

one line of the corpus?

everything between two empty lines?

everything within one document?

Should the events be connected to the events before and after?

No.

Context :

A context is a whole document or a paragraph within which we will take (three) consecutive words as occurrences or events. The last words of a context will not form an occurrence with the first words of the next context.

Occurrence :

An occurrence or event is will result in one event in the end. This can be (three) consecutive words, a sentence, or a line in the corpus file.

pyndl.preprocess.event_generator(event_file, cue_id_map, outcome_id_map, *, sort_within_event=False)[source]

pyndl.preprocess.filter_event_file(input_event_file, output_event_file, *, keep_cues='all', keep_outcomes='all', remove_cues=None, remove_outcomes=None, cue_map=None, outcome_map=None, n_jobs=1, number_of_processes=None, chunksize=100000, verbose=False)[source]

Filter an event file by a list or a map of cues and outcomes.

Parameters

You can either use keep_*, remove_*, or map_*.
input_event_filestr: path where the input event file is
output_event_filestr: path where the output file will be created
keep_cues“all” or sequence of str: list of all cues that should be kept
keep_outcomes“all” or sequence of str: list of all outcomes that should be kept
remove_cuesNone or sequence of str: list of all cues that should be removed
remove_outcomesNone or sequence of str: list of all outcomes that should be removed
cue_mapdict: maps every cue as key to the value. Removes all cues that do not have a key. This can be used to map several different cues to the same cue or to rename cues.
outcome_mapdict: maps every outcome as key to the value. Removes all outcome that do not have a key. This can be used to map several different outcomes to the same outcome or to rename outcomes.
n_jobsint: number of threads to use
chunksizeint: number of chunks per submitted job, should be around 100000

Notes

It will keep all cues that are within the event and that (for a human reader) might clearly belong to a removed outcome. This is on purpose and is the expected behaviour as these cues are in the context of this outcome.

If an event has no cues it gets removed, but if an event has no outcomes it is still present in order to capture the background rate of that cues.

pyndl.preprocess.ngrams_to_word(occurrences, n_chars, outfile, remove_duplicates=True)[source]

Process the occurrences and write them to outfile.

Parameters

occurrencessequence of (cues, outcomes) tuples: cues and outcomes are both strings where underscores and # are special symbols.
n_charsnumber of characters (e.g. 2 for bigrams, 3 for trigrams, …)
outfilefile handle
remove_duplicatesbool: if True make cues and outcomes per event unique

pyndl.preprocess.process_occurrences(occurrences, outfile, *, cue_structure='trigrams_to_word', remove_duplicates=True)[source]

Process the occurrences and write them to outfile.

Parameters

occurrencessequence of (cues, outcomes) tuples: cues and outcomes are both strings where underscores and # are special symbols.
outfilefile handle
cue_structure{‘bigrams_to_word’, ‘trigrams_to_word’, ‘word_to_word’}
remove_duplicatesbool: if True make cues and outcomes per event unique

pyndl.preprocess.read_binary_file(binary_file_path)[source]

pyndl.preprocess.to_bytes(int_)[source]

pyndl.preprocess.to_integer(byte_)[source]

pyndl.preprocess.write_events(events, filename, *, start=0, stop=4294967295, remove_duplicates=None)[source]

Write out a list of events to a disk file in binary format.

Parameters

eventsiterator of (cue_ids, outcome_ids) tuples called event
filenamestring
startfirst event to write (zero based index)
stoplast event to write (zero based index; excluded)
remove_duplicates{None, True, False}: if None though a ValueError when the same cue is present multiple times in the same event; True make cues and outcomes unique per event; False keep multiple instances of the same cue or outcome (this is usually not preferred!)

Returns

number_eventsint: actual number of events written to file

Raises

StopIterationevents generator is exhausted before stop is reached

Notes

The binary format as the following structure:

8 byte header
nr of events
nr of cues in first event
ids for every cue
nr of outcomes in first event
ids for every outcome
nr of cues in second event
...

pyndl.wh

pyndl.wh provides functions in order to train Widrow-Hoff (WH) models. In contrast to the Rescorla-Wagner (RW) models, the WH models can not only have binary cues and outcomes, but can encode gradual intensities in the cues and outcomes. This is done by associating a vector of continues values (real numbers) to each cue and outcome. The size of the vector has to be the same for all cues and for all outcomes, but can differ between cues and outcomes.

It is possible to calculate weights for continuous cues or continues outcomes, while keeping the outcomes respectively cues binary. Finally, it is possible to have both sides, cues and outcomes, to be continues and calculate the Widrow-Hoff learning rule between them.

pyndl.wh.dict_wh(events, eta, cue_vectors, outcome_vectors, *, weights=None, inplace=False, remove_duplicates=None, make_data_array=False, verbose=False)[source]

Calculate the weights for all_outcomes over all events in events.

This is a pure python implementation using dicts.

Parameters

eventsgenerator or str: generates cues, outcomes pairs or the path to the event file
etafloat: learning rate
cue_vectorsxarray.DataArray: matrix that contains the cue vectors for each cue
outcome_vectorsxarray.DataArray: matrix that contains the target vectors for each outcome
weightsdict of dicts or xarray.DataArray or None: initial weights
inplace: {True, False}: if True calculates the weightmatrix inplace if False creates a new weightmatrix to learn on
remove_duplicates{None, True, False}: if None though a ValueError when the same cue is present multiple times in the same event; True make cues and outcomes unique per event; False keep multiple instances of the same cue or outcome (this is usually not preferred!)
make_data_array{False, True}: if True makes a xarray.DataArray out of the dict of dicts.
verbosebool: print some output if True.

Returns

weightsdict of dicts of floats: the first dict has outcomes as keys and dicts as values the second dict has cues as keys and weights as values weights[outcome][cue] gives the weight between outcome and cue.
or
weightsxarray.DataArray: with dimensions ‘outcome_vector_dimensions’ and ‘cue_vector_dimensions’. You can lookup the weights between a cue dimension and an outcome dimension with weights.loc[{'outcome_vector_dimensions': outcome_vector_dimension, 'cue_vector_dimensions': cue_vector_dimension}] or weights.loc[outcome_vector_dimension].loc[cue_vector_dimension].

Notes

The metadata will only be stored when make_data_array is True and then dict_ndl cannot be used to continue learning. At the moment there is no proper way to automatically store the meta data into the default dict.

Furthermore, this implementation only supports the ‘real to real’ case where cue vectors are learned on outcome vectors. For the ‘binary to real’ or ‘real to binary’ cases the wh.wh function needs to be used which uses a fast cython implementation.

The main purpose of this function is to have a reference implementation which is used to validate the faster cython version against. Additionally, this function can be a good starting point to develop different flavors of the Widrow-Hoff learning rule.

pyndl.wh.wh(events, eta, *, cue_vectors=None, outcome_vectors=None, method='openmp', weights=None, n_jobs=8, n_outcomes_per_job=10, remove_duplicates=None, verbose=False, temporary_directory=None, events_per_temporary_file=10000000)[source]

Calculate the weights for all events using the Widrow-Hoff learning rule in three different flavors.

In the first flavor, cues and outcomes both are vectors and the names in the eventfiles refer to these vectors. The vectors for all cues and outcomes are given as an xarray.DataArray with the arguments cue_vectors and `outcome_vectors’.

In the second and third flavor, only the cues or only the outcomes are treated as vectors and the ones not being treated as vectors are still considered being present or not being present in a binary way.

This is a parallel python implementation using cython, numpy, multithreading and the binary format defined in preprocess.py.

Parameters

eventsstr: path to the event file
etafloat: learning rate
cue_vectorsxarray.DataArray: matrix that contains the cue vectors for each cue
outcome_vectorsxarray.DataArray: matrix that contains the target vectors for each outcome
method{‘openmp’, ‘threading’, ‘numpy’}: ‘numpy’ works only for real to real Widrow-Hoff.
weightsNone or xarray.DataArray: the xarray.DataArray needs to have the named dimensions ‘cues’ or ‘cue_vector_dimensions’ and ‘outcomes’ or ‘outcome_vector_dimensions’
n_jobsint: an integer giving the number of threads in which the job should be executed
n_outcomes_per_jobint: an integer giving the number of outcomes that are processed in one job
remove_duplicates{None, True, False}: if None raise a ValueError when the same cue is present multiple times in the same event; True make cues and outcomes unique per event; False keep multiple instances of the same cue or outcome (this is usually not preferred!)
verbosebool: print some output if True
temporary_directorystr: path to directory to use for storing temporary files created; if none is provided, the operating system’s default will be used like ‘/tmp’ on unix
events_per_temporary_file: int: Number of events in each temporary binary file. Has to be larger than 1

Returns

weightsxarray.DataArray: the dimensions of the weights reflect the type of Widrow-Hoff that was run (real to real, binary to real, real to binary or binary to binary). The dimension names reflect this in the weights. They are a combination of ‘outcomes’ x ‘outcome_vector_dimensions’ and ‘cues’ x ‘cue_vector_dimensions’ with dimensions ‘outcome_vector dimensions’ and ‘cue_vector_dimensions’. You can lookup the weights between a vector dimension and a cue with weights.loc[{'outcome_vector_dimensions': outcome_vector_dimension, 'cue_vector_dimensions': cue_vector_dimension}] or weights.loc[vector_dimension].loc[cue_vector_dimension].