What kind of hardware do you use for semantic work?

One thing I encountered early on is that semantic work has demanding hardware requirements; it's not hard to find applications that won't even run on a laptop with 8GB of RAM, and anything that involves a triple store and more than a few million triples benefits from as much RAM as you can get.

So today we have choices between workstations and laptops as well as multi-core CPUs vs. CPUs that are faster at single-threaded execution. Now there's also a choice between solid state disks and rotating hard drives, between hardware you own vs cloud solutions and between different cloud solutions. Even inside EC2 there are many instance types, and then there are other platforms like Heroku and Google AppEngine that offer restricted memory but might make up for it with massive parallelism.

What sort of hardware do you use for semantic work?

I mostly do offline processing for which the RAM requirements are not so hefty (RAM can be substituted for patience). We only keep a few non-critical live services running. Other than Lucene and the occasional parser, I tend to write custom code from scratch, which allows for tailoring the app to the machine (esp. RAM limitations).

My laptop: for writing papers and running very small tasks; occasionally MineSweeper.

Ten or so shared-nothing servers, 7 years old, ethernet, 4GB RAM, 2.2GHz, dual SATA: Batch processing jobs that can be easily divide-and-conquer'ed (external sorts, crawls, scans, data analyses, indexing). Typically I throw together some custom RMI code to do the job. Process datasets in the low billions of statements. They're getting old, but they're still a reliable work-horse for many off-line processing tasks.

One server, quad-core, 64GB RAM: Haven't had a chance to use it yet, but planning on running some RDF leaning on it soon, which will likely need a good chunk of RAM.

  1. A 64 GB ram 16 core machine for hosting beta site 4.5 billion triples. RAID 10 disk array of 8 disks
  2. A 6 GB ram 3 core virtual machine for daily deployment for a small often updated dataset
  3. A 6 GB ram 3 core virtual machine for continuous integration of the semantic software
  4. My workstation 4 GB 2 core machine for development and small testing
  5. My laptop 8 GB 2 core machine for development and smaller testing when I am working out of the office

For production we will go with 512GB ram 64 cores and 4 fast SAS disks.