CSc 8711. Databases and the Web
Spring 2004, Programming Project 3 (XML Parsing, Web Search/Browse)
Due: Thursday, March 4, 2004
This project consists of the following parts:
Part I (XML Parsing)
Write a Java application that uses the SAX Parsing technique to read
bibliography data from SIGMOD's XML
data set and store the data into an Oracle database. Two types of XML files
will be input: OrdinaryIssuePage (sample XML data file)
and ProceedingsPage (sample XML data file).
To get the entire set, you may visit http://www.acm.org/sigmod/record/xml/
and download the Nov2002 zip file.
The DTDs for these two XML files are available at
OrdinaryIssuePage.dtd and
ProceedingsPage.dtd.
The relational database schema into which the data will be loaded is as follows:
drop table volumes cascade constraints;
create table volumes (
vno number(3),
num number(3),
month varchar2(20),
year number(4),
primary key (vno,num)
);
--
-- sample insert
-- insert into volumes values (23,3,'September',1994);
-- Only high level volume information is kept here.
--
drop table ordinaryIssues cascade constraints;
create table ordinaryIssues (
vno number(3),
num number(3),
notes varchar2(1024),
primary key (vno,num,notes)
foreign key (vno,num) references volumes
);
--
-- sample insert
-- insert into ordinaryIssues values (23,3,null);
-- notes go here; could be more than one note although I did
-- not see one with two notes but DTD suggests that there could be
--
drop table proceedings cascade constraints;
create table proceedings (
vno number(3),
num number(3),
confYear number(4),
conference varchar2(256),
datte varchar2(20),
location varchar2(100),
primary key (vno,num),
foreign key (vno,num) references volumes
);
--
--sample insert
-- insert into proceedings values (15,2,1986,
-- 'ACM SIGMOD International Conference on Management of Data',
-- 'May 28-30, 1986', 'Washington, D.C., USA');
-- information about conference goes here
--
drop table articles cascade constraints;
create table articles (
acode number(7),
vno number(3),
num number(3),
title varchar2(256),
primary key (acode),
foreign key (vno,num) references volumes
);
--
-- sample insert from ordinaryIssue
-- insert into articles values (233208,23,3,
-- 'Response to the March 1994 ODMG-93 Commentary Written ....');
-- sample insert from proceedings
-- insert into articles values (152009,15,2,
-- 'Load balancing in a locally distributed DB system');
--
drop table oArticles cascade constraints;
create table oArticles (
acode number(7),
section varchar2(50),
href varchar2(256),
format varchar2(20),
primary key (acode),
foreign key (acode) references articles
);
--
-- sample insert
-- insert into oArticles values (233208,'CORRESPONDENCE','http://...',
-- 'Postscript');
-- more details about ordinary issue article goes here
-- if more than one href, use only the first
--
drop table pArticles cascade constraints;
create table pArticles (
acode number(7),
initPage number(4),
endPage number(4),
addMaterialHref varchar2(256),
addMaterial varchar2(50),
fullTextHref varchar2(256),
format varchar2(50),
fullTextSize number(10),
primary key (acode),
foreign key (acode) references articles
);
--
-- sample insert
-- insert into pArticles values (152009,108,119,
-- 'http://....','Index Terms','http://...','Full Text in PDF Format',
-- 1462);
-- more details on proceedings articles - note size is in KB
-- if more than one hrefs, use only one.
--
drop table authors cascade constraints;
create table authors (
acode number(7),
authorPosition number(3),
authorName varchar2(50),
primary key (acode,authorPosition),
foreign key (acode) references articles
);
--
-- sample inserts
-- insert into authors values (233208,0,'Object Database Management Group');
-- insert into authors values (152009,1,'Michael J. Carey');
-- insert into authors values (152009,2,'Hongjun Lu');
--
Part II (Browse and Search Servlets)
Using Java servlets that access the Oracle database created in Part I,
implement a browse and a search page. The browse page should be similar to
what you see
here - Only the middle part with the listing of each year and the volumes
within the year. For the search interface, please design something suitable
with search text boxes for title, author, and pulldown lists for
year, ordinary Issue or Proceedings, etc.