Starting from human annotations, we provide a strategy based on machine learning that performs preference ranking on alternative machine translations of the same source, at sentence level. Rankings are decomposed into pairwise comparisons so that they can be learned by binary classifiers, us- ing black-box features derived from linguistic analysis. In order to recompose from the pairwise decisions of the classifier, they are weighed with their clas- sification probabilities, increasing the correlation coefficient by 80%. We also demonstrate several configurations of successful automatic ranking models. The best configurations achieve a correlation with human judgments measured by Kendalls tau at 0.27. Although the method does not use reference trans- lations, this correlation is comparable to the one achieved by state-of-the-art reference-aware automatic evaluation metrics such as smoothed BLEU, ME- TEOR and Levenshtein distance.