Package ml :: Package lib :: Package scikit :: Module learn_model
[hide private]
[frames] | no frames]

Module learn_model

source code

learn_model -- Program that learns machine translation quality estimation models

learn_model is a program with which is possible to learn models for sentence-pair quality estimation models using the algorithms implemented in the scikit-learn machine learning toolkit.

It defines functions to work with different machine learning algorithms as well as feature selection techniques and features preprocessing. The only dependency so far is the sklearn package. ConfigParser is used to parse the configuration file which has a similar layout to the Java properties file.


Author: Jose' de Souza

Copyright: 2012. All rights reserved.

License: Apache License 2.0

Contact: jose.camargo.souza@gmail.com

Date: 2012-11-01

Updated: 2012-11-01

Classes [hide private]
  CLIError
Generic exception to raise and log different fatal errors.
Functions [hide private]
 
set_selection_method(config, threshold=.25)
Given the configuration settings, this function instantiates the configured feature selection method initialized with the preset parameters.
source code
 
set_scorer_functions(scorers) source code
 
set_optimization_params(opt) source code
 
optimize_model(estimator, X_train, y_train, params, scores, folds, verbose, n_jobs) source code
 
set_learning_method(config, X_train, y_train)
Instantiates the sklearn's class corresponding to the value set in the configuration file for running the learning method.
source code
 
fit_predict(config, X_train, y_train, X_test=None, y_test=None)
Uses the configuration dictionary settings to train a model using the specified training algorithm.
source code
 
cross_validate(config, X_train, y_train)
Uses the configuration dictionary settings to train a model using the specified training algorithm.
source code
 
run(config)
Runs the main code of the program.
source code
 
run_crossvalidation(config)
Runs the main code of the only cross validation.
source code
 
main(argv=None)
Command line options.
source code
Variables [hide private]
  DEBUG = 0
  PROFILE = 0
  DEFAULT_SEP = "\t"
Function Details [hide private]

set_selection_method(config, threshold=.25)

source code 

Given the configuration settings, this function instantiates the configured feature selection method initialized with the preset parameters.

TODO: implement the same method using reflection (load the class dinamically at runtime)

Parameters:
  • config - the configuration file object loaded using yaml.load()
Returns:
an object that implements the TransformerMixin class (with fit(), fit_transform() and transform() methods).

set_learning_method(config, X_train, y_train)

source code 

Instantiates the sklearn's class corresponding to the value set in the configuration file for running the learning method.

TODO: use reflection to instantiate the classes

Parameters:
  • config - configuration object
Returns:
an estimator with fit() and predict() methods

fit_predict(config, X_train, y_train, X_test=None, y_test=None)

source code 

Uses the configuration dictionary settings to train a model using the specified training algorithm. If set, also evaluates the trained model in a test set. Additionally, performs feature selection and model parameters optimization.

Parameters:
  • config - the configuration dictionary obtained parsing the configuration file.
  • X_train - the np.array object for the matrix containing the feature values for each instance in the training set.
  • y_train - the np.array object for the response values of each instance in the training set.
  • X_test - the np.array object for the matrix containing the feature values for each instance in the test set. Default is None.
  • y_test - the np.array object for the response values of each instance in the test set. Default is None.

cross_validate(config, X_train, y_train)

source code 

Uses the configuration dictionary settings to train a model using the specified training algorithm. If set, also evaluates the trained model in a test set. Additionally, performs feature selection and model parameters optimization.

Parameters:
  • config - the configuration dictionary obtained parsing the configuration file.
  • X_train - the np.array object for the matrix containing the feature values for each instance in the training set.
  • y_train - the np.array object for the response values of each instance in the training set.
  • X_test - the np.array object for the matrix containing the feature values for each instance in the test set. Default is None.
  • y_test - the np.array object for the response values of each instance in the test set. Default is None.

run(config)

source code 

Runs the main code of the program. Checks for mandatory parameters, opens input files and performs the learning steps.

run_crossvalidation(config)

source code 

Runs the main code of the only cross validation. Checks for mandatory parameters, opens input files and performs the learning steps.