The first part requires each of you to collect data on one movie from
http://us.imdb.com/top_250_films.
The format of the data should be as follows (note: there should be no
space after the begin lt symbol in the start and end tags; I could
not get rid of it in the html source!):
< db>
< movies>
< movie id="godfatherthe">
< title>Godfather, The< /year>
< year>1972< /year>
< directors>
< director idref="francisfordcoppola"/>
< /directors>
< genres>
< genre>Crime< /genre>
< genre>Drama< /genre>
< /genres>
< plot>A Mafia boss' son, previously uninvolved in the business, takes over when his father is critically wounded in a mob hit.
< /plot>
< cast>
< performer>
< actor idref="marlonbrando"/>
< role>Don Vito Corleone< /role>
< /performer>
< performer>
< actor idref="alpacino"/>
< role>Michael Corleone< /role>
< /performer>
< performer>
< actor idref="dianekeaton"/>
< role>Kay Adams Corleone< /role>
< /performer>
< performer>
< actor idref="robertduvall"/>
< role>Tom Hagen< /role>
< /performer>
< performer>
< actor idref="jamescaan"/>
< role>Santino "Sonny" Corleone< /role>
< /performer>
< /cast>
< /movie>
...
...
< /movies>
< performers>
< performer id="marlonbrando">
< name>Marlon Brando Jr.< /name>
< dob>3 April, 1924< /dob>
< pob>Omaha, Nebraska< /pob>
< actedin>
< movie idref="godfather"/>
< movie idref="apocalypsenow"/>
< /actedin>
< directs>
< movie idref="xx"/>
< /directs>
< /performer>
...
...
< /performers>
< /db>
The movies are assigned as follows:
cscangx http://us.imdb.com/Title?0068646 Godfather, The (1972)
cscanwx http://us.imdb.com/Title?0111161 Shawshank Redemption, The (1994)
cscbnlx http://us.imdb.com/Title?0108052 Schindler's List (1993)
cscbnwx http://us.imdb.com/Title?0033467 Citizen Kane (1941
cschhwx http://us.imdb.com/Title?0034583 Casablanca (1942)
cschnkx http://us.imdb.com/Title?0071562 Godfather: Part II, The (1974)
cschnwx http://us.imdb.com/Title?0076759 Star Wars (1977)
cscjghx http://us.imdb.com/Title?0047478 Shichinin no samurai (1954)
cscjngx http://us.imdb.com/Title?0209144 Memento (2000)
cscjshx http://us.imdb.com/Title?0073486 One Flew Over the Cuckoo's Nest (1975)
csckdnx http://us.imdb.com/Title?0057012 Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1964)
csclnlx http://us.imdb.com/Title?0082971 Raiders of the Lost Ark (1981)
csclnmx http://us.imdb.com/Title?0047396 Rear Window (1954)
cscmntx http://us.imdb.com/Title?0169547 American Beauty (1999)
cscnnnx http://us.imdb.com/Title?0114814 Usual Suspects, The (1995)
cscpnbx http://us.imdb.com/Title?0080684 Star Wars: Episode V - The Empire Strikes Back (1980)
cscpnxx http://us.imdb.com/Title?0054215 Psycho (1960)
csctnsx http://us.imdb.com/Title?0110912 Pulp Fiction (1994)
cscvcsx http://us.imdb.com/Title?0102926 Silence of the Lambs, The (1991)
cscvnbx http://us.imdb.com/Title?0053125 North by Northwest (1959)
cscwncx http://us.imdb.com/Title?0038650 It's a Wonderful Life (1946)
cscxncx http://us.imdb.com/Title?0190332 Wo hu cang long (2000)
cscxnfx http://us.imdb.com/Title?0099685 Goodfellas (1990)
cscxnhx http://us.imdb.com/Title?0056172 Lawrence of Arabia (1962)
cscyglx http://us.imdb.com/Title?0050083 12 Angry Men (1957)
cscyllx http://us.imdb.com/Title?0120815 Saving Private Ryan (1998)
mathqnx http://us.imdb.com/Title?0052357 Vertigo (1958)
cscpnhx http://us.imdb.com/Title?0075314 Taxi Driver (1976)
cscmasx http://us.imdb.com/Title?0119488 L.A. Confidential (1997)
By Thursday (20th Sept), I would like for each of you to create the
movie element assigned to you. For each movie you should also prepare the top 5 performers
XML file. I would like for you to prepare two separate files: movie.xml and performer.xml
and keep them in your home directories for me to copy and integrate. The movie.xml file should
contain one movie element and the performer.xml should contain 5 performer elements (without the enclosing
performers tag). Also, in a separate text file called performers.txt please include the idref (one per line)
of each of your performers. I will use this file to eliminate duplicates when I integrate the data.