Hadoop @ TACC
Welcome to Hadoop at TACC user group. This site is dedicated to the documents and guides to start up and run Apache Hadoop @ Texas Advanced Computing Center (TACC). Hadoop is an open source Java implementation of Google's MapReduce Distributed Computing Framework.
Dec. 18 2013 Notice: The Longhorn cluster will be decommissioned in early 2014. But don't panic. Another MapReduce system is coming soon to TACC. Please stay tuned for further updates.
- From High Performance Computing to Data-Intensive Computing: Analyzing Massive Datasets using Hadoop. Talk at TACC's 2011 Scientific Software Days event.
- Hadoop on TACC Longhorn Cluster
- See various slides from CS395T / INF385T / LIN386M: Data-Intensive Computing for Text Analysis (Fall 2011)
Join the TACC-Hadoop email discussion list (Google group)
Longhorn Cluster Structure
- At a glance
Get your hands dirty
- Hello World! on Hadoop @ TACC
- Upload your data to TACC computer
- Word Count Example on Shakespeare's Othello
- When you cannot upload files to HDFS ← This happens a lot!
- org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hr4757/index/term-doc-vectors-forward-index.dat could only be replicated to 0 nodes, instead of 1
- Datanodes available: 0 (0 total, 0 dead)
- INFO ipc.Client: Retrying connect to server: c201-116/18.104.22.168:9000. Already tried 0 time(s).
- [Fatal Error] core-site.xml:1:1: Premature end of file
- Error occurred during initialization of VM
- Download Apache Hadoop
- How to setup at local computer for debugging and testing.
- Hadoop in Python - Dumbo
- How to set up Eclipse with hadoop (Courtesy of Dan Garrette)