Hadoop @ TACC
Welcome to Hadoop at TACC user group. This site is dedicated to the documents and guides to start up and run Apache Hadoop @ Texas Advanced Computing Center (TACC). Hadoop is an open source Java implementation of Google's MapReduce Distributed Computing Framework.
Dec. 18 2013 Notice: The Longhorn cluster will be decommissioned in early 2014. But don't panic. Another MapReduce system is coming soon to TACC. Please stay tuned for further updates.
Introductory slides
- From High Performance Computing to Data-Intensive Computing: Analyzing Massive Datasets using Hadoop. Talk at TACC's 2011 Scientific Software Days event.
- Hadoop on TACC Longhorn Cluster
- See various slides from CS395T / INF385T / LIN386M: Data-Intensive Computing for Text Analysis (Fall 2011)
Join the TACC-Hadoop email discussion list (Google group)
Longhorn Cluster Structure
Start-up guide
- At a glance
- A simple recipe to run HADOOP at TACC
- Step-by-step Guide
- Create an user account at TACC
- Log in to Longhorn
- Install Hadoop at your TACC account
- Submit a job request
- VNC to TACC [Optional to view Job Tracker and Hadoop Admin]
- Advanced Users
- Longhorn User Guide
Get your hands dirty
- Hello World! on Hadoop @ TACC
- Upload your data to TACC computer
- Word Count Example on Shakespeare's Othello
- PART I: Create Your Own Jar
- PART II: Run a Jar on Hadoop
- The super easy way to test your Hadoop installation and build jars, etc: Run wordcount with Taccdoop
Play-on guide
Trouble shooting
- When you cannot upload files to HDFS ← This happens a lot!
- org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hr4757/index/term-doc-vectors-forward-index.dat could only be replicated to 0 nodes, instead of 1
- Datanodes available: 0 (0 total, 0 dead)
- INFO ipc.Client: Retrying connect to server: c201-116/129.114.19.21:9000. Already tried 0 time(s).
- [Fatal Error] core-site.xml:1:1: Premature end of file
- Error occurred during initialization of VM
Backround: Hadoop
- What is Hadoop? (Wikipedia)
- Hadoop Wiki
- Download Apache Hadoop
- How to setup at local computer for debugging and testing.
- Hadoop in Python - Dumbo
- How to set up Eclipse with hadoop (Courtesy of Dan Garrette)
Additional Information