Using the Oracle Bulk Loader 


Overview

To use the Oracle bulk loader, you need a control file, which specifies how data should be loaded into the database; and a data file, which specifies what data should be loaded.  You will learn how to create these files in turn.


Creating the Control File

A simple control file has the following form:
LOAD DATA
INFILE <dataFile>
APPEND INTO TABLE <tableName>
FIELDS TERMINATED BY '<separator>'
(<list of all attribute names to load>)
As a concrete example, here are the contents of a control file test.ctl:
LOAD DATA
INFILE test.dat
INTO TABLE test
FIELDS TERMINATED BY '|'
(i, s)


Creating the Data File

Each line in the data file specifies one tuple to be loaded into <tableName>. It lists, in order, values for the attributes in the list specified in the control file, separated by <separator>. As a concrete example, test.dat might look like:
1|foo
2|bar
3| baz
Recall that the attribute list of test specified in test.ctl is (i, s), where i has the type int, and s has the type char(10). As the result of loading test.dat, the following tuples are inserted into test:
(1, 'foo')
(2, 'bar')
(3, ' baz')
Warning: Note that the third line of test.dat has a blank after "|". This blank is not ignored by the loader. The value to be loaded for attribute s is ' baz', a four-character string with a leading blank. It is a common mistake to assume that 'baz', a three-character string with no leading blank, will be loaded instead. This can lead to some very frustrating problems that you will not notice until you try to query your loaded data, because ' baz' and 'baz' are different strings.


Loading Your Data

The Oracle bulk loader is called sqlload. It is a UNIX-level command, i.e., it should be issued directly from your UNIX shell, rather than within sqlplus. A bulk load command has the following form:
sqlload userid=<yourName>/<yourPasswd> control=<ctlFile> log=<logFile>
where <yourName> is your Oracle login and <yourPasswd> is your Oracle password. The safer form of login, where you give only your login name and let the system prompt you for password applies to sqlload as well. <ctlFile> is the name of the control file. If no file name extension is provided, sqlload will assume the default extension ".ctl". The name of the data file is not needed on the command line because it is specified within the control file. Finally, you may designate <logFile> as the log file. If no file name extension is provided, ".log" will be assumed. sqlload will fill the log file with relevant information about the bulk load operation, such as the number of tuples loaded, and a description of errors that may have occurred.

As a concrete example, if sally, with password etaoinshrdlu, wishes to run the control file test.ctl and have the log output stored in test.log, then she should type

sqlload userid=sally/etaoinshrdlu control=test.ctl log=test.log


Loading Without a Separate Data File

It is possible to use just the control file to load data, instead of using a separate data file. Here is an example:
LOAD DATA
INFILE *
INTO TABLE test
FIELDS TERMINATED BY '|'
(i, s)
BEGINDATA
1|foo
2|bar
3| baz
The trick is to specify "*" as the name of the data file, and use BEGINDATA to start the data section in the control file.


Loading DATE Data

The DATE datatype can have its data loaded in a format you specify with considerable flexibility. First, suppose that you have created a relation with an attribute of type DATE:
CREATE TABLE foo (
    i int,
    d date
);
In the control file, when you describe the attributes of foo being loaded, you follow the attribute d by its type DATE and a date mask. A date mask specifies the format your date data will use. It is a quoted string with the following conventions: Here is an example control file:
LOAD DATA
INFILE *
INTO TABLE foo
FIELDS SEPARATED BY '|'
(i, d DATE 'dd-mm-yyyy')
BEGINDATA
1|01-01-1990
2|4-1-1998
Notice that, as illustrated by the second tuple above, a field can be shorter than the corresponding field in the date mask. The punctuation "-" tells the loader that the day and month fields of the second tuple terminate early.