Stream I/O
Introduction
Up until now, we have been working with programs that call classes
manipulate data and occasionally print something out. We have been
building classes and testing them with fixed values. The is a limit
to how useful such a program can be as it has limited interaction
with the outside world. In the next few lectures we are going to
deal with how to get information into and out of our programs.
Today we are talking about Stream IO, which will allow use to
read from and write to files and terminal windows (as well as
networks connections as we'll see next week).
The Java stream library is particularly (and unnecessarily) complex,
so we will first discuss the general properties of stream IO. Then
map these ideas onto the Java implementation.
Where does stream input come from (or go)?
- Files
- Terminal Window: a special mechanism to handle typed input and display
output.(Emulates the notion of CRT's connected to computer through serial
ports.)
- Network connections
- Printers and other device (audio, USB, modem).
- Other programs (as in UNIX pipes '|').
Stream semantics
- Streams (in the most general sense) are a very common model of
I/O. The basic ideas of Streams are supported in most environments (even if
they are not called such).
- The basic idea of a data stream can be visualized as some
sort of pipe or hose for data. Your program has one end of this pipe, while
the other end is connected to the file or device you are interacting with.
Your program pushes data input the pipe and it (can be read from the other
end). Your program can also remove data from the pipe and use it.
- The primitive form of data supported by streams is usually bytes.
Any more complex data type characters, Strings, integers, objects, etc. must
be built up from bytes. This opens the issues of data formats which we will
address laters.
- There are 4 basic operations associated with streams which you
should find in any implementation: open,close,read,write.
- stream = open(device or file name)
- Creates a new stream (sometimes implemented with a construction
rather than an open() routine.)
- Arguments will depend on what sort of device is geing opened
- Returns error code or throw error if can't open
- close(stream); -- closes the given stream. Closed streams
are no longer available for reading and writing.
- num = read(byte[] buf,int nbytes, Stream stream)
- reads up to 'nbytes' worth of data (bytes) from the stream into
buf.
- Returns number of bytes actually read.
- Read data is removed from the stream, the next read will get
different data (pipe metaphor).
- num = write(byte[] buf,int nbytes, Stream stream)
- writes up to 'nbytes' worth of data (bytes) from buf to the
stream.
- Returns number of bytes actually written.
- Blocking: If we are reading from a terminal window, network
or another program. It may happen that there is no input in the stream
to be read (for example, nothing has been typed to the terminal). In
this case most read() implementation will wait for some input to appear before
returning. It is said to be "blocked". This is convenient as the
program does not has to handling this case any differently, on the
other hand it has implications which we will encounter later.
- The write(), open(), and close() operations can also block. This
is rare when dealing with local files, but common when dealing with
network connections or other programs.
- Sometimes a program does not want to block (because then it is stuck).
Most stream implementation provide an method (in Java called 'available()')
that indicates whether there in any data in the stream to read.
- Buffering Sometimes a program wants to read bytes 1 at a time.
However, that underlying mechanism for reading bytes from the Operating
System (the ultimate source) is pretty expensive, and we would like to read
data in large chunks to minimize this overhead. To do this most
stream implementation provide "buffered" reads and writes, which internally
read and write large chunks from the operating system to maximize
efficiency, yet let the programmer read and write (to the buffer)
in whatever sizes are convenient. In many stream implementations, buffering
is hidden or implicit, in Java it is explicit.
Data Formats
In general, we do not want to read and write uninterpreted bytes. We are
interested in reading higher-level datatypes, integers, floats, characters,
object, text strings etc. This leads to the problem of how to read bytes
and assemble them into the datatypes we are interesting in. In general,
to do this we need to know how the data is stored in the file we are
reading (or produced by the source we are reading from).
Note on binary format: You may be used to looking at text files. However,
store large amounts of numerical or inunterpreted data (such as compiled
code) is very inefficient. For example, a Java int take 4 bytes of memory
in binary form, if we store it in text form, it will take about 10 bytes.
Hence, non-text data is often stores be directly dumping the memory
representations into a file. These are known as binary files.
For example, an integer in Java is 4 bytes. This would usually be stored
as 4 consecutive bytes in a file, so we can read 4 byte to construct our
integer. However, in which order should the bytes be assembed? It turns
out different machines and systems organize this differently. In some
languages, such as C, one actually has to assemble all of your datatypes
by hand from bytes, and worry about what type of machine they were
originally written on. Fortunately, Java saves us much of that work
(at the price of complexity in the number of stream classes).
This problem of data formats occurs in text strings as well. Many text
files in Western systems are stored using 1 byte per character. For
Western alphabets, this works fine and there are standard encodings,
such as ASCII and Latin 1). For non-western language, such as Chinese,
we been more bytes per character and the encoding becomes more complex.
The Java stream library
Some examples
Read a line of input from the terminal and print it back.
InputStream in1 = System.in; // get InputStream from system, byte stream
InputStreamReader in2 = new InputStreamReader(in1); // adds Unicode conversion
BufferedReader instream = new BufferedReader(in2); // add buffering
// BufferedReader support readLine() so we can get a line (strips newline)
String line = instream.readLine();
System.out.println(line);
Read binary data from a file (note try/catch to handle IOExceptions)
try{
DataInputStream in =
new DataInputStream (
new BufferedInputStream(
new FileInputStream("infile.bin")));
int data = in.readInt();
}
catch(IOException e){
// code to handle IO exceptions (no file, etc)
}
Tokenizing
One other class that is useful enough to be worth mentioning in StringTokenizer.
This class will break up a text line into a sequence ot words (or a sequence of
tokens separated by some other character). See description in book.
File and directory manipulation
- Java has a class File that directly manipuilates files and directories.
- It can test for existance, return protections, test type, create, delete.
- It can also deal with directories: Creating dirs, notion of current dir, home dir
Recitation
- Finish 'common tasks' and file/dir calls if not done
in lecture.