Hadoop Distributed Filesystem is like any other other file system with few exceptions.
HDFS is described by below three points.
· Works best when u have large files.
· Streaming access(write once read many) .
· Works on commodity hardware.
It is different from Portable Operating System Interface(POSIX) as if files saved is less than the block size of disk(4kb is default) it will waste/use who of the block but in HDFS if one is having file of 2kb then the rest of 2kb is free for other systems to use.
HDFS is having a block size which is greater that POSIX by default it is 128MB.means each file is divided into n blocks (file>128 MB).Each block is stored on different data nodes and name node holds metadata only(which data node holds which block).
We have a concept of replication (default is 3),means each block is stored on 3 different locations.to handle fault tolerance.We have a secondary name node to handle name node failures.
HDFS’s
fsck command understands blocks. For example, running:
% hdfs fsck
/ -files -blocks
which gives the block info of att the files in hdfs.
which gives the block info of att the files in hdfs.
HDFS CLI commands:
Java Api for HDFS:
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/fs/package-summary.html
I have developed below code which reads,writes,deletes and stores in HDFS.
package com.hadoop.tutorial;
import java.io.InputStream;
import java.net.URI;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.IOUtils;
public class FileSystemCat {
public static void main(String[] args) throws Exception {
String uri = args[0];
String dest = args[1];
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
FSDataInputStream in = null;
FSDataOutputStream out = null;
try {
in = fs.open(new Path(uri));
//fs.mkdirs(new Path(dest));
out=fs.create(new Path(dest));
IOUtils.copyBytes(in, System.out, 4096, false);
in.seek(0);
IOUtils.copyBytes(in, out, 4096, false);
fs.delete(new Path(uri),true);
} finally {
IOUtils.closeStream(in);
}
}
}
$ hadoop jar gettingstarted_mapreduce.jar com.hadoop.tutorial.FileSystemCat hdfs://quickstart.cloudera:8020/user/cloudera/abc.txt hdfs://quickstart.cloudera:8020/user/cloudera/xyz.txt
package com.hadoop.tutorial;
import java.io.InputStream;
import java.net.URI;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.IOUtils;
public class FileSystemCat {
public static void main(String[] args) throws Exception {
String uri = args[0];
String dest = args[1];
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
FSDataInputStream in = null;
FSDataOutputStream out = null;
try {
in = fs.open(new Path(uri));
//fs.mkdirs(new Path(dest));
out=fs.create(new Path(dest));
IOUtils.copyBytes(in, System.out, 4096, false);
in.seek(0);
IOUtils.copyBytes(in, out, 4096, false);
fs.delete(new Path(uri),true);
} finally {
IOUtils.closeStream(in);
}
}
}
$ hadoop jar gettingstarted_mapreduce.jar com.hadoop.tutorial.FileSystemCat hdfs://quickstart.cloudera:8020/user/cloudera/abc.txt hdfs://quickstart.cloudera:8020/user/cloudera/xyz.txt
Coherency Model :
If More than a block of data is written to the hdfs ,it is available for others users to read.
and the current written box wont b available
to the reader.
HDFS provides a way to force all buffers to
be flushed to the datanodes via the hflush() method on FSDataOutputStream
Once data is written and hflush is
called it is available to users to read.use hsync() to make sure it is written
on directory.
Parallel Copying with distcp
one can run mapreduce jobs to paraller
process and transfer data between two clusters,nodes and directories by calling
distcp
hadoop distcp dir1 dir2
Managing a business data is not an easy thing, it is very complex process to handle the corporate information both Hadoop and cognos doing this in a easy manner with help of business software suite, thanks for sharing this useful post….
ReplyDeleteRegards,
cognos Training in Chennai|cognos Training Chennai|cognos Training
Cloud servers are the best in safe guarding one's information thorugh online. Without this dedicated methodology many companies would have not existed at all. The same though has been furnished above. Thanks for sharing this worth while content in here. Keep writing article like this.
ReplyDeleteSalesforce course in chennai | Salesforce course in chennai | Salesforce administrator training in chennai
Very Nice article,keep sharing more article.
ReplyDeleteKeep updating..
Big Data Hadoop Course
tütün sarma makinesi
ReplyDeletesite kurma
sms onay
binance hesap açma
SU5Q5