Modeling Algorithm Performance on Highly-threaded Many-core Architectures