Can Julia compete with Python
Julia: Take stock of how it went
I came across a question from 2012 that had a very good discussion about Julia as an alternative to R / Python for different types of statistical work.
Here lies the original 2012 question about Juliet's promise
Unfortunately Julia was very new at the time and the toolkits needed for statistical work were a bit primitive. Bedbugs were ironed out. Distributions were difficult to install. And so on.
Someone had a very accurate comment on this question:
However, it will take another 5 years before this question can be answered in retrospect. As of now, Julia is missing the following critical aspects of a statistical programming system that could compete with R for everyday users:
That was in 2012. Now that it's 2015 and three years have passed, I was wondering how people think Julia did that.
Is there a richer experience with the language itself and the entire Julia ecosystem? I would like to know.
- Would you advise new statistical tool users to learn Julia about R?
- In what statistics use cases would you advise someone to use Julia?
- If R is slow on a particular task, does it make sense to switch to Julia or Python?
Note: First published June 14, 2015.
I switched to Julia and here are my pragmatic reasons:
- It is really good at gluing code. I have much older code in MATLAB, and MATLAB.jl took 5 minutes to install, works fine, and has concise syntax that makes it natural to use MATLAB functions. Julia also does the same for R, Python, C, Fortran and many other languages.
- Julia does parallelism really well. Not only am I talking about parallelism with multiple processors (shared memory), but also parallelism with multiple nodes. I have access to an HPC node that is not used too often as each is very slow. That's why I decided to give Julia a try. I added @parallel to a loop by telling it the machine file and bam that all 5 nodes were used. Try this in R / Python. In MPI it would take a while to work (and do so with knowing what you are doing), not a few minutes the first time you try!
- Julia's vectorization is fast (faster than any other high level language in many cases) and his devectorized code is almost C-fast. So when you write scientific algorithms, you usually write them first in MATLAB and then in C. Julia lets you write it once, then compiler code it, and 5 minutes later it is fast. Even if you don't, it means that you just write the code as it feels natural and it will work fine. In R / Python, sometimes you have to think a lot to get a good vectorized version (which can be difficult to understand later).
- The metaprogramming is great. Think about the number of times you wrote "I wish I could ______ in the language". Write a macro for it. Usually someone already has.
- Everything is on Github. The source code. The packages. Super easy to read the code, report problems to the developers, talk to them to find out how to do something, or even improve packages yourself.
- They have some really good libraries. For statistics, you would probably be interested in their optimization packages (JuliaOpt is a group that manages these). The numeric packages are already top notch and are only improving.
Still, I still love Rstudio a lot, but the new Juno on Atom is really nice. When it is no longer in strong development and stable, I can consider it better than Rstudio because of the simple plugins (example: it has a good plugin to adapt to HIDPI screens). So I think Julia is a good language to learn now. It has worked out fine for me so far. YMMV.
I think "learning X about Y" is not the right way to phrase the question. In fact, you can learn both (at least the basics) and choose the right tool depending on the specific task at hand. And since Julia has inherited most of her syntax and concepts from other languages, it should be really easy to understand (as well as Python, although I'm not sure that the same applies to R).
Which language is better suited for which task? Based on my experience with these tools, I would rate them as follows:
For pure statistical studies that seems can be done with REPL and some scripts R. to be the perfect choice. It was developed specifically for statistical purposes, has the longest tool history and probably the largest set of statistical libraries.
If you add statistics (or machine learning, for example) to the Production system integrate want seems Python one Much Better Alternative To Be: As a universal programming language, it has a fantastic web stack, bindings to most APIs, and libraries for everything. from scraping the web to creating 3D games.
High performance algorithms are in Julia much easier to write. If you just need to use or combine existing libraries like SciKit Learn or e1071 that are supported by C / C ++, then you are fine with Python and R. However, when it comes to fast backend itself, Julia is really time-saving: It's much faster than Python or R and doesn't require any additional knowledge of C / C ++. For example, Mocha.jl is being re-implemented in the pure Julia deep learning framework Caffe, which was originally written in C ++ with a wrapper in Python.
Also, don't forget that some libraries are only available in some languages. E.g. only Python has mature ecosystem for computer vision, only implemented some shape matching and trasnformation algorithms in Julia and I have heard of some unique packages for statistics in medicine in R.
(b) In what kind of statistics use cases would you advise someone to use Julia
(c) If R is slow on a particular task, does it make sense to switch to Julia or Python?
High dimensional and computationally intensive problems.
Multiprocessing. Julia's single node parallel functions () are much more convenient than those in Python. For example, in Python you can't use a map reduction multiprocessing pool on the REPL, and each function you want to parallelize requires a lot of boilerplate.
Cluster computing. Julia's package lets you use a compute cluster almost as if you were a single machine with multiple cores. [I played with it to make this feel more like a script in ClusterUtils]
Shared memory. Julia's objects are superior to the corresponding shared memory objects in Python.
- Speed. My Julia implementation is faster (on a machine) than my R implementation for random number generation and linear algebra (supports multithreaded BLAS).
- Interoperability. Julia's module allows you to access the Python ecosystem without a wrapper - e.g. B. for. There is something similar for R, but I haven't tried it. There are also libraries for C / Fortran.
GPU. Julia's CUDA wrappers are far more advanced than those in Python (Rs were almost non-existent when I checked). I suspect that this will continue to be the case as it is much easier to call external libraries in Julia than in Python.
Ecosystem. The module uses Github as a backend. I believe that this will have a big impact on the long-term maintainability of Julia modules, as it is much easier to offer patches or give ownership to the owners.
Writing fast code for big problems will increasingly depend on parallel computing. Python is inherently parallel and unfriendly (GIL), and native multiple processing in R is not AFAIK. Julia doesn't require you to head to C to write high-performance code while keeping the feel of Python / R / Matlab.
The main disadvantage of Julia starting with Python / R is the lack of documentation outside of the core functionality. Python is very mature, and what you can't find in the documents is usually Stackoverflow. R's documentation system is pretty good by comparison.
(a) Would you advise new users of statistical tools to learn Julia about R?
Yes, if you include the use cases in part (b). When your use case involves a lot of heterogeneous work
- How much traction does Instacart have
- Create a website for event planning
- Is PPC inbound or outbound marketing
- Why is Donald Trump going to Mexico
- Content is still king in SEO
- What is the motivation for studying pharmacy?
- How many billionaires do you actually know
- What are some examples of living fossils
- When did the Aryans come to Kerala
- Obama liked Putin
- Who is better batsman Sachin or Sehwag?
- Which Coca-Cola product is your favorite?
- How do we evaluate this limit
- Is Zac Efron nice
- How does a computer understand probability
- Why don't actresses protest against sex scenes
- How are plans for cancer treatment developed
- What are the 21 consonants in vowels
- What is S leohkim parom in Russian
- Paracetamol can be used as a blood thinner
- Where does Dobby die in Harry Potter
- Is death blissful
- Does Zostel have her properties?
- What are some free database management systems