Revisiting ECM on GPUs

Jonas Wloka, Jan Richter-Brockmann, Colin Stahlke, Thorsten Kleinjung, Christine Priplata, Tim Güneysu

In: 19th International Conference on Cryptology and Network Security (CANS 2020). International Conference on Cryptology and Network Security (CANS-2020) December 14-16 Vienna Austria Lecture Notes in Computer Science (LNCS) Springer 2020.


Mo­dern pu­blic-key cryp­to­gra­phy is a cru­ci­al part of our con­tem­pora­ry life where a se­cu­re com­mu­ni­ca­ti­on chan­nel with ano­ther party is nee­ded. With the ad­van­ce of more power­ful com­pu­ting ar­chi­tec­tu­res – es­pe­ci­al­ly Gra­phics Pro­ces­sing Units (GPUs) – tra­di­tio­nal ap­proa­ches like RSA and Diffie-Hell­man sche­mes are more and more in dan­ger of being bro­ken. We pre­sent a high­ly op­ti­mi­zed im­ple­men­ta­ti­on of Len­stra’s ECM al­go­rithm cust­o­mi­zed for GPUs. Our im­ple­men­ta­ti­on uses sta­te-of-the-art el­lip­tic curve arith­me­tic and op­ti­mi­zed in­te­ger arith­me­tic while pro­vi­ding the pos­si­bi­li­ty of ar­bi­tra­ri­ly sca­ling ECM’s pa­ra­me­ters al­lowing an ap­p­li­ca­ti­on even for lar­ger dis­cre­te lo­ga­rithm pro­blems. Fur­ther­mo­re, the pro­po­sed soft­ware is not li­mi­ted to any specific GPU ge­ne­ra­ti­on and is to the best of our know­ledge the first im­ple­men­ta­ti­on sup­porting mul­ti­ple de­vice com­pu­ta­ti­on. To this end, for a bound of B1=8,192 and a mo­du­lus size of 192 bit, we achie­ve a through­put of 214 thousand ECM tri­als per se­cond on a mo­dern RTX 2080 Ti GPU con­s­i­de­ring only the first stage of ECM. To solve the dis­cre­te lo­ga­rithm pro­blem for lar­ger bit sizes, our soft­ware can ea­si­ly sup­port lar­ger pa­ra­me­ter sets such that a through­put of 2,781 ECM tri­als per se­cond is achie­ved using B1=50,000, B2=5,000,000, and a mo­du­lus size of 448 bit.

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence