Introduction

This is a review of hardware and software used for econometric computation based on personal experience. I will walk through the available technology then make my recommendation. In addition, I will discuss the setup for a cloud virtual machine. What you need will depend on what you do. I usually work with dataframes that are less than 10GB, I run regression models using observational data on people and firms (ie. OLS, Fixed Effects, Panel), and my deliverables are mostly office files (docs, sheets, slides) and sometimes latex.

Statistical Packages

Stata is the gold standard for econometrics. While Python and R offer better features that enhance the coding experience, these have yet to support econometrics in the same depth as Stata. I recommend using Stata for regression modeling and choosing between Python or R as optional tools to improve your data cleaning and visualization.

Stata

  • Gold standard for econometric analysis with excellent documentation.
  • Stable and fast
  • Excllenet documentation from academicians, Q&A in statalist
  • Stata’s built in code editor does not offer features available in modern code editors/IDEs (ie, Jupyter, Rstudio, Visual Studio Code.
  • Linux compatibilty is stuck at Ubuntu 16 (dependency problem with later Ubuntu releases, this affects running Stata in cloud virtual machines).
  • Syntax is built around working with a single dataset in memory, making it cumbersome for data cleaning. Stata 16 introduced dataframes, so it is getting better at it.

Python (Pandas)

  • Python’s Pandas package is popular for data cleaning in industry (Production ready)
  • Python is a popular programming language, which makes it a marketable skill.
  • Excellent community support in stackexchange
  • Package management via Anaconda Anaconda
  • Jupyter Notebook and Jupyter Lab (Combine your notes and code and run code inside your notebook)

R (Tidyverse)

  • R’s Tidyverse package is popular for data cleaning and visualization in academia (Research)
  • R’s syntax has a steeper learning code, as Tidyverse has its own syntax that is significantly different from R’s base syntax, as if you were learning two separate languages.
  • Excellent community support in stackexchange
  • More cutting edge and experimental ML/predictive models
  • Package management via Anaconda Anaconda
  • Integrated development environment (IDE) Rstudio
  • Make web apps using Shiny

MS Word and Latex

My research output goes to academic journals in the field of health economics and health services research. These journals ask for .docx uploads for track change. Latex is a powerful language for typesetting and offers a well established syntax for entering math notations. While Latex is the gold standard for manuscript typesetting, I have reservations about its effectivenes for writing and collaborative writing. I find the markup syntax of latex disruptive to writing, and it makes it difficult to check for spelling and grammar inside the code editor. The dependencies required for compiling a latex file are also obstacles when collaborating with other users. Overleaf addresses these issues by moving the application to the cloud, but the complexity of the latex markup syntax remains a barrier to collaboration with lay users.

Hardware and Operating System

Stata, Python, and R all support Windows, Mac, and Linux. With all three, I am able to run almost all of my analysis in a machine with 16GB of RAM. A few times, I need a machine with 32GB of RAM. Stata supports multi-core processing (Stata MP). Python and R run on a single core (there are ways to use multi-core but I have yet to see a slick implementation of this). Therefore, after meeting 16GB of RAM or more, hardware and OS choice is up to user preference. Also, unlike running ML analysis, GPUs are currently not a significant requirement for econometric analysis in Stata, Python, and R.

Based on my experience, Stata support for Linux has minor issues. For Ubuntu, dependencies from Ubuntu 16 only packages are required (libncurses5 and libpng12-0), so running Stata 15 on Ubuntu 18 or the latest Ubuntu 20 raises the error libgtk-x11-2.0.so.0: cannot open shared object file:. This error is not raised when installing Stata 15 on Ubuntu 16. On Debian, I was able to get Stata 15 running on Debian Buster and Stretch by adding the Jessie repository.

My Development Environment

My current development environment is as follows.

Local Machine

  • Windows laptop
  • Stata 15
  • Anaconda as package manager for Python and Jupyter
  • Jupyter Lab (included in Anaconda) as development environment
  • Run Stata inside Jupyter lab using stata_kernel

Cloud Machine

  • Ubuntu 16 Virtual Machine
  • Stata 15 Linux
  • Miniconda as package manager for Python and Jupyter
  • Jupyter Lab (included in miniconda)
  • Run Stata inside Jupyter using stata_kernel

Conclusion

I have been a Stata user for over 10 years starting from graduate school. I am happy to see Stata make small but solid steps with each upgrade. I have also seen rapid advancements in Python and R that have them useful tools for econometricians. Looking forward, I see opprotunities to move computation to the cloud for improved collaborations between researchers working remotely. For example, Jupyter Notebooks can be served in the cloud via Colaboratory. Along with Python and R, Stata should keep pace with advanacements in web technology.