Package ml :: Package lib :: Package scikit :: Module learn_model

Module learn_model

learn_model -- Program that learns machine translation quality estimation models

learn_model is a program with which is possible to learn models for sentence-pair quality estimation models using the algorithms implemented in the scikit-learn machine learning toolkit.

It defines functions to work with different machine learning algorithms as well as feature selection techniques and features preprocessing. The only dependency so far is the sklearn package. ConfigParser is used to parse the configuration file which has a similar layout to the Java properties file.

Author: Jose' de Souza

License: Apache License 2.0

Contact: jose.camargo.souza@gmail.com

Date: 2012-11-01

Updated: 2012-11-01

Classes

[hide private]

CLIError
Generic exception to raise and log different fatal errors.

Functions

[hide private]

set_selection_method(config, threshold=.25)
Given the configuration settings, this function instantiates the configured feature selection method initialized with the preset parameters.

source code

set_scorer_functions(scorers)

source code

set_optimization_params(opt)

source code

optimize_model(estimator, X_train, y_train, params, scores, folds, verbose, n_jobs)

source code

set_learning_method(config, X_train, y_train)
Instantiates the sklearn's class corresponding to the value set in the configuration file for running the learning method.

source code

fit_predict(config, X_train, y_train, X_test=None, y_test=None)
Uses the configuration dictionary settings to train a model using the specified training algorithm.

source code

cross_validate(config, X_train, y_train)
Uses the configuration dictionary settings to train a model using the specified training algorithm.

source code

run(config)
Runs the main code of the program.

source code

run_crossvalidation(config)
Runs the main code of the only cross validation.

source code

main(argv=None)
Command line options.

source code

Variables

[hide private]

DEBUG = 0

PROFILE = 0

DEFAULT_SEP = "\t"

Function Details

[hide private]

set_selection_method(config, threshold=.25)

source code

Given the configuration settings, this function instantiates the configured feature selection method initialized with the preset parameters.

TODO: implement the same method using reflection (load the class dinamically at runtime)

Parameters:

config - the configuration file object loaded using yaml.load()

Returns:

an object that implements the TransformerMixin class (with fit(), fit_transform() and transform() methods).

set_learning_method(config, X_train, y_train)

source code

Instantiates the sklearn's class corresponding to the value set in the configuration file for running the learning method.

TODO: use reflection to instantiate the classes

Parameters:

config - configuration object

Returns:

an estimator with fit() and predict() methods

fit_predict(config, X_train, y_train, X_test=None, y_test=None)

source code

Uses the configuration dictionary settings to train a model using the specified training algorithm. If set, also evaluates the trained model in a test set. Additionally, performs feature selection and model parameters optimization.

Parameters:

config - the configuration dictionary obtained parsing the configuration file.
X_train - the np.array object for the matrix containing the feature values for each instance in the training set.
y_train - the np.array object for the response values of each instance in the training set.
X_test - the np.array object for the matrix containing the feature values for each instance in the test set. Default is None.
y_test - the np.array object for the response values of each instance in the test set. Default is None.

cross_validate(config, X_train, y_train)

source code

Parameters:

config - the configuration dictionary obtained parsing the configuration file.
X_train - the np.array object for the matrix containing the feature values for each instance in the training set.
y_train - the np.array object for the response values of each instance in the training set.
X_test - the np.array object for the matrix containing the feature values for each instance in the test set. Default is None.
y_test - the np.array object for the response values of each instance in the test set. Default is None.

run(config)

source code

Runs the main code of the program. Checks for mandatory parameters, opens input files and performs the learning steps.

run_crossvalidation(config)

source code

Runs the main code of the only cross validation. Checks for mandatory parameters, opens input files and performs the learning steps.