CSc 8711. Databases and the Web
Spring 2004, Programming Project 3 (XML Parsing, Web Search/Browse)
Due: Thursday, March 4, 2004

This project consists of the following parts:

Part I (XML Parsing)

Write a Java application that uses the SAX Parsing technique to read bibliography data from SIGMOD's XML data set and store the data into an Oracle database. Two types of XML files will be input: OrdinaryIssuePage (sample XML data file) and ProceedingsPage (sample XML data file). To get the entire set, you may visit http://www.acm.org/sigmod/record/xml/ and download the Nov2002 zip file. The DTDs for these two XML files are available at OrdinaryIssuePage.dtd and ProceedingsPage.dtd. The relational database schema into which the data will be loaded is as follows:
drop table volumes cascade constraints;
create table volumes (
  vno       number(3),
  num       number(3),
  month     varchar2(20),
  year      number(4),
  primary key (vno,num)
);
--
-- sample insert
-- insert into volumes values (23,3,'September',1994);
-- Only high level volume information is kept here.
--
drop table ordinaryIssues cascade constraints;
create table ordinaryIssues (
  vno       number(3),
  num       number(3),
  notes     varchar2(1024),
  primary key (vno,num,notes)
  foreign key (vno,num) references volumes
);
--
-- sample insert
-- insert into ordinaryIssues values (23,3,null);
-- notes go here; could be more than one note although I did
-- not see one with two notes but DTD suggests that there could be
--
drop table proceedings cascade constraints;
create table proceedings (
  vno         number(3),
  num         number(3),
  confYear    number(4),
  conference  varchar2(256),
  datte       varchar2(20),
  location    varchar2(100),
  primary key (vno,num),
  foreign key (vno,num) references volumes
);
--
--sample insert
-- insert into proceedings values (15,2,1986,
--  'ACM SIGMOD International Conference on Management of Data',
--  'May 28-30, 1986', 'Washington, D.C., USA');
-- information about conference goes here
--
drop table articles cascade constraints;
create table articles (
  acode   number(7),
  vno     number(3),
  num     number(3),
  title   varchar2(256),
  primary key (acode),
  foreign key (vno,num) references volumes
);
--
-- sample insert from ordinaryIssue
-- insert into articles values (233208,23,3,
--   'Response to the March 1994 ODMG-93 Commentary Written ....');
-- sample insert from proceedings
-- insert into articles values (152009,15,2,
--    'Load balancing in a locally distributed DB system');
-- 
drop table oArticles cascade constraints;
create table oArticles (
  acode    number(7),
  section  varchar2(50),
  href     varchar2(256),
  format   varchar2(20),
  primary key (acode),
  foreign key (acode) references articles
);
--
-- sample insert
-- insert into oArticles values (233208,'CORRESPONDENCE','http://...',
--    'Postscript');
-- more details about ordinary issue article goes here
-- if more than one href, use only the first
--
drop table pArticles cascade constraints;
create table pArticles (
  acode   number(7),
  initPage number(4),
  endPage number(4),
  addMaterialHref varchar2(256),
  addMaterial  varchar2(50),
  fullTextHref  varchar2(256),
  format  varchar2(50),
  fullTextSize  number(10),
  primary key (acode),
  foreign key (acode) references articles
);
--
-- sample insert
-- insert into pArticles values (152009,108,119,
--   'http://....','Index Terms','http://...','Full Text in PDF Format',
--   1462);
-- more details on proceedings articles - note size is in KB
-- if more than one hrefs, use only one.
--
drop table authors cascade constraints;
create table authors (
  acode   number(7),
  authorPosition  number(3),
  authorName varchar2(50),
  primary key (acode,authorPosition),
  foreign key (acode) references articles
);
--
-- sample inserts
-- insert into authors values (233208,0,'Object Database Management Group');
-- insert into authors values (152009,1,'Michael J. Carey');
-- insert into authors values (152009,2,'Hongjun Lu');
--

Part II (Browse and Search Servlets)

Using Java servlets that access the Oracle database created in Part I, implement a browse and a search page. The browse page should be similar to what you see here - Only the middle part with the listing of each year and the volumes within the year. For the search interface, please design something suitable with search text boxes for title, author, and pulldown lists for year, ordinary Issue or Proceedings, etc.