lecture9: Threads
Motivation
Most programs we have see so far had a single thread of control. In
sequential programs, execution starts at main and follows sequentially,
branching via control structures and method calls. In event-based
programs, control follows the event loop, which extracts events and
calls handlers. But there is still a single control thread,
if an event handler loops forever, the program locks up.
However, it is often convenient to have multiple threads of control; to
have the program doing more than one thing at once. For example:
- Blocking on network, while still processing user input.
- Performing background computation while still processing user input
- - Processing multiple parallel input streams, ie downloading multiple
files simultaneously.
There are several ways to accomplish this:
Use a variation on event-based programming in which control loops
over all of the task is the systems, running each until it blocks
or surrenders control. For I/O based programs where we are waiting
for multiple channels, we can sleep on a timer and periodically
check all sourcesfor something to do. This is called "polling".
These approaches in essences require building a mini operating
system into the program. The scheme has some disadvantages: the
tasks must remember to surrender control or other tasks are starved,
and the handler routines for each task cannot use the call stack
and local variables to store task state.
Since this is a common programming circumstance, most programming systems
now provide a capability to address these issues, usually called
threads.
Processes:
To understand threads, we start with processes. Processes are the
operating systems notion of 'program'. A processes state consists
of several portions.
- The program code -- often called the text segment
- Global data -- including the heap
- Program counter, address ot current instruction
- Stack and stack pointer -- call frames holding local variables,
arguments and return points.
- open file descriptors and memory management resources,
Modern operating systems manage multiple processes. Each process in
effect sees it's own version of the computer and is isolated from
the other processes. Computing resources and shared via 'time-slicing'.
A module in the OS, called the scheduler, runs each process in
turn for a certain amount of time. Then the state of that process is
stored, and the next process is run (storing process state and swapping
processes involve major programming magic). There are two types of
schedulers, pre-emptive and non-pre-emptive. In non-pre-emptive
scheduling, a process runs until it blocks or surrenders control.
In pre-emptive scheduling, the OS enforces time limits on running
times and can interrupt a process in the middle of computation and
let others run.
Threads
- Threads are similar in spirit to processes, but more lightweight.
- A thread consists of the Program Counter and Call Stack portion of
a process state.
- These portions of the process state determine where the control is
in the process and what the program is doing,
- These portions of the process state can be replicated allowing
multiple simultaneous control threads.
- The processes code, global data, and heap are all shared between the
various threads. (The prosents some issues we'll address later).
- Memory picture:
Java Threads
- Threads are supported in Windows, various UNIX flavors, and in Java. All
of these work basiclly the same, though the Java system is of course
Object-Oriented rather than procedure-based.
- Java threads are based on one class and one interface.
- Thread class - represents a thread of control
- Runnable interface - abstracts thread code.
- consist of run() method.
- To run a task in a new thread, there are two choices.
- Inherit from Thread object and override run() method with
task code.
- Create a new object that implements Runnable() with again run()
containing the task code. Create a new Thread with this object
as an argument.
- Starting threads.
- Creating the Thread object does not start it running. One
must call the start() method to start execution.
- Thread state.
- Threads have several states and transition based on method calls
or OS events (unblocking).
- Thread state diagram.
- Killing threads
- die naturally when run() returns. Can be externally killed.
- Example: Inheriting
class Worker extends Thread{
// .. Worker instance vars here
run(){
// .. Worker code here
}
}
class Test{
public static void main(String[] args){
// .. init code
Worker worker = new Worker();
worker.start();
// .. Main thread continues here
}
}
- Example: Implementing Runnable
class Worker implements Runnable{
// .. Worker instance vars here
run(){
// .. Worker code here
}
}
class Test{
public static void main(String[] args){
// .. init code
Worker worker = new Worker();
Thread mythread = new Thread(worker);
mythread.start();
// .. Main thread continues here
}
}
Java Thread Scheduling
- Java threads NOT necessarily pre-emptively scheduled. Depends on
platform.
- Threads should call sleep() or yield() to allow other threads to run.
A portable implementation should plan for worst case. This is pretty
lame and a real pain.
- Threads have priorities. Higher priority thread run before lower.
Don't rely on this for anything, implementation is spotty.
Synchronization
- Problem: Since global data is shared among threads, we need to
be careful if two threads try to access the same data. Especially,
in the case of pre-emptive scheduling.
- Example:
a = a + 1; // simple increment of a
- Increment consist of 2 steps (different instructions),
1) read old value and increment
2) write back new value.
- With pre-emptive scheduling a thread can be interrupted after step
1) and the other thread runs. There are several possible execution
orders, ignoring symmetry.
- T1I1 T1I2 T2I1 T2I2 , T1I1 T2I1 T1I2 T2I2, T1I1 T2I1 T2I2 T1I2
- if a initally = 3 these end up with a = 5 , 4 , 4 respectively.
- However, when we program we want the same answer all of the time.
- Solution: synchronize thread execution. Several technologies:
- Critial sections
- mutexes
- semaphores
- monitor
- Basicly same idea: OS intervenes to lock code or data so only one
thread at a time can run.
Java Synchronization
- synchronized methods - use synchronized keyword
- When a thread calls a synchronized method, that object instance is
locked by that thread. Any other thread attempting to call ANY other
synchronized method on that object will block until the first thread
exits.
- Example:
class Test{
int a=3;
public synchronized inc(){
a = a + 1; // syncronize increment
}
}
- In the fragment above, only one thread allowed in inc() at a time.
Results are therefore deterministic (a good thing).
Deadlock
- threads can lock multiple objects by through nested calls to synchronized
routines.
- threads can lock the same object twice through nested calls. Object is not
freed until last call returns.
- In systems other than Java, it is important to make sure Exception
handlers and all return paths unlock all objects. In Java this is
automatic.
- Still a problem, two synch routines a() and b()
Thread 1 calls a() then b();
Thread 2 calls b() then a();
- If scheduling is such that each thread 1st call happem before the
second, each thread is waiting on the other and will wait forever!
Don't do this.
- An straightforward algorithm for avoiding this: If threads must
lock multiple objects, they always request them in the same order.
If both threads in example call a() first, one succeed the the other
waits. The winner also gets b(), no deadlocks.
- Other things to avoid:
- blocking on I/O in synchronized methods, unless you really want this.
More Thread Notes;
- Synchronization routines are high overhead. Use sparingly.
- Swing is in general NOT sychronized. Designate one thread to update GUI
Common Thread Patterns
- I/O Handling - file download/processing while still responding to GUI
- Parallel - multiple instance of common activity (eg file download)
- Pipeline - multi-stage computation with sources, sink, filters, data-flow
- Worker/Background - background computation while responding to GUI and
networks
Recitation
Review thread and synchronization
Introduce explicit wait(), notify() and advanced threading
Pipeline Streams