Anomalous Cluster Runtime Effect

+1 vote
asked Nov 30 by yixuan (140 points)

Hi,

I have tried running codes with Itensor on my personal computer as well as the cluster of my school and compared the results. It turned out that the CPU time needed on every sweep was much more (usually 5-8 times) on the cluster than on my computer, which was really unsettling. I have tested several times with only one variable changed,

If I run the codes that didn't involve Itensor, the CPU time needed on the cluster was less (3 times) than that on my computer, which sounded normal because the cluster did have a better performance.

The Itensor package was built on the same platform Lapack, with the same compiler GCC 6.4.0 and the same flags.

I have tested 2 different codes on the cluster and my computer. One was the sample code called "dmrg.cc" and the other was a 2D model that I wrote with a custom siteset and iqdmrg. The results are the same. (5-8 times slower on the cluster)

I kept the same parameters when I run the codes. The calculated observables (ground state energy, magnetization) were identical to 10 decimals, so I believed there were no extra hidden procedure when running on the cluster.

I have tried switching to mpirun and it didn't make any difference. I did realized that the executive file created on the cluster was smaller than that on my computer. I thought it was because I was using windows system while the cluster was a linux.

Does anyone else have this kind of problem before?

Best,
Yixuan

commented Dec 1 by miles (19,650 points)
Hi Yixuan,
Was the "wall time" also much slower / longer on the cluster than on your personal computer? I ask because one explanation could be that the definition of cpu time as reported by ITensor DMRG is misleading, and could just be a reflection of the greater number of cores present on your cluster machine.
commented Dec 1 by yixuan (140 points)
Hi Miles,
Thank you for your prompt reply. Yes, the "wall time" was also much longer on the cluster, as well as the CPU time. I have also tried starring at the computer waiting for it to finish, it really took longer time on the cluster.
commented Dec 1 by miles (19,650 points)
Hi Yixuan,
Thanks. Hm, so it will be hard to guess an answer without more information. The only thing I can guess is that maybe the lapack & BLAS on the cluster is not a very good one (ideally most of the running time of ITensor is spent in BLAS routines). Is there another BLAS/Lapack distribution available on your cluster? A very good one to use if available is Intel MKL.
commented Dec 1 by yixuan (140 points)
Miles,
I just checked the version of lapack & BLAS. The version on my computer was 3.7.1 and the version on the cluster was also 3.7.1, I have upgraded the version to 3.8.0 on the cluster but it didn't make it faster. I will try to install Intel MKL later and let you know if it helps.
commented Dec 1 by miles (19,650 points)
Hi Yixuan,
Thanks, but the version may not be as important as whether that distribution of lapack is well optimized for the particular type of cpu of your cluster machine. One of the good things about Intel MKL is that it is extremely fine tuned for each type of Intel cpu. So yes please give that a try if you can.
commented Dec 1 by yixuan (140 points)
Hi Miles,
I have installed and used Intel MKL on the cluster and the problem was solved. The runtime on the cluster is now faster than that on my computer. I think it is because Intel MKL is more compatible with Intel cpu, just like you said. Meanwhile, I am still using Lapack/BLAS on my computer, which has cygwin based on Windows and Intel cpu. Why it worked on my computer will be left as a mystery.

1 Answer

+1 vote
answered Dec 1 by miles (19,650 points)

(Answering here to mark the question as answered - see above discussion.)

Yixuan,
Great to hear that you were able to resolve the issue. Even if the original blas implementation worked ok, MKL would probably be significantly better just due to the high quality of its implementation. So I think it would have been worth it for you to switch to MKL either way. I think it's not too surprising that different kinds of blas implementations can work rather differently with different cpus and operating systems, since the blas imeplementations are so fine tuned for specific environments. It's good to know about these things, so thanks for asking.

Best,
Miles

Welcome to ITensor Support Q&A, where you can ask questions and receive answers from other members of the community.

Formatting Tips:
  • To format code, indent by four spaces
  • To format inline LaTeX, surround it by @@ on both sides
  • To format LaTeX on its own line, surround it by $$ above and below
  • For LaTeX, it may be necessary to backslash-escape underscore characters to obtain proper formatting. So for example writing \sum\_i to represent a sum over i.
If you cannot register due to firewall issues (e.g. you cannot see the capcha box) please email Miles Stoudenmire to ask for an account.

To report ITensor bugs, please use the issue tracker.
...