bigshot
Senior HTF Member
- Joined
- Jan 30, 2008
- Messages
- 2,933
- Real Name
- Stephen
I've been asked to talk about the media server project that I'm overseeing for a non-profit digital archive where I serve on the Board of Directors. Our users are primarily film/animation students and artists, and our digital assets fill nearly 100 TB of disk space. Since we are a non-profit educational organization, we can operate under certain fair use exclusions in the Digital Millennium Copyright Act that might not apply to individuals or for profit businesses, and obviously the scale we are working on is far beyond the capabilities of most home theater owners. But perhaps info on how we built our servers might help people design a smaller system for their own home.
We currently have two servers... the primary media library is a custom built database designed to contain 1) biographical information on film makers and artists, 2) high resolution scans of images (photos, artwork, digitized books, etc.), and 3) digitized films. Every asset is optimized for the smallest possible file size, and has a file name with a unique identifying naming convention that allows us to build cross references between different kinds of data... For instance, you can look at the biography of a film maker and see a list of the films he worked on. Click on "film cross links" and view a digitized copy of the films by that film maker in the collection. Then click on "media cross links" and see photos and artwork related to the film maker or specific film. This is a way of organizing media for researchers and students that is right on the cutting edge of library science. It requires a LOT of volunteer man hours in digitizing, cataloging and tagging. It's far beyond the ability of any individual. We've been working on this project for over a decade and have a crew of volunteers who build it digital "brick" by digital "brick". The primary database contains hundreds of thousands of files that all have to be instantly searchable, so we create "work copies" of every asset at a reduced resolution or as compressed files. We back up higher resolution lossless copies on our secondary server that can be easily rolled in to upgrade the resolution or compression settings as technology advances and computers and hard drives get faster.
Our secondary video server would probably be of greatest interest to home theater folks... We originally started with a library of DVDs that numbered well over 10,000 titles. Storage, organizing and access to all of this physical media was a huge challenge... shelves and shelves of stuff. How to organize it? Alphabetical? By subject? It spent most of its time sitting on the shelf because it was too unwieldy to be accessible. Rather than have it all collect dust, I worked with a couple of my volunteers on a plan to get it into circulation. First I pared off all the cases and sleeved the discs, but even then it was very difficult to find specific titles that the students and media curators were interested in accessing. The solution was to rip the DVDs losslessly to a digital file and organize them on a disk array with a media server that automatically catalogs and plays back the files. We use both Plex and XBMC/Kodi and it has streamlined access to the secondary library tremendously. This give us easy access to curate and process the material so we can edit, tag it and transfer it into the main media library as volunteer time allows. The secondary server acts as the "archival copy" and we use the files here for screenings and events where maximum resolution is needed. At this point, most of the video is 480p, but I have started to try to bring in HD content as well. The problem is the file sizes involved. It's difficult to maintain lossless HD video with the number of titles we need to archive, so I've been experimenting with compression. The archivist in me wants the secondary server to be completely lossless, but it just isn't practical right now. That's a problem to be worked out in the future as technology advances.
The basic workflow for video is like this... all of these steps are always going on. We don't do one thing at a time. It's an ongoing process.
We currently have two servers... the primary media library is a custom built database designed to contain 1) biographical information on film makers and artists, 2) high resolution scans of images (photos, artwork, digitized books, etc.), and 3) digitized films. Every asset is optimized for the smallest possible file size, and has a file name with a unique identifying naming convention that allows us to build cross references between different kinds of data... For instance, you can look at the biography of a film maker and see a list of the films he worked on. Click on "film cross links" and view a digitized copy of the films by that film maker in the collection. Then click on "media cross links" and see photos and artwork related to the film maker or specific film. This is a way of organizing media for researchers and students that is right on the cutting edge of library science. It requires a LOT of volunteer man hours in digitizing, cataloging and tagging. It's far beyond the ability of any individual. We've been working on this project for over a decade and have a crew of volunteers who build it digital "brick" by digital "brick". The primary database contains hundreds of thousands of files that all have to be instantly searchable, so we create "work copies" of every asset at a reduced resolution or as compressed files. We back up higher resolution lossless copies on our secondary server that can be easily rolled in to upgrade the resolution or compression settings as technology advances and computers and hard drives get faster.
Our secondary video server would probably be of greatest interest to home theater folks... We originally started with a library of DVDs that numbered well over 10,000 titles. Storage, organizing and access to all of this physical media was a huge challenge... shelves and shelves of stuff. How to organize it? Alphabetical? By subject? It spent most of its time sitting on the shelf because it was too unwieldy to be accessible. Rather than have it all collect dust, I worked with a couple of my volunteers on a plan to get it into circulation. First I pared off all the cases and sleeved the discs, but even then it was very difficult to find specific titles that the students and media curators were interested in accessing. The solution was to rip the DVDs losslessly to a digital file and organize them on a disk array with a media server that automatically catalogs and plays back the files. We use both Plex and XBMC/Kodi and it has streamlined access to the secondary library tremendously. This give us easy access to curate and process the material so we can edit, tag it and transfer it into the main media library as volunteer time allows. The secondary server acts as the "archival copy" and we use the files here for screenings and events where maximum resolution is needed. At this point, most of the video is 480p, but I have started to try to bring in HD content as well. The problem is the file sizes involved. It's difficult to maintain lossless HD video with the number of titles we need to archive, so I've been experimenting with compression. The archivist in me wants the secondary server to be completely lossless, but it just isn't practical right now. That's a problem to be worked out in the future as technology advances.
The basic workflow for video is like this... all of these steps are always going on. We don't do one thing at a time. It's an ongoing process.
- Digitization: Capturing video from film/tape, ripping disks to various file formats depending on the source
- Tagging: Labelling the files so Plex/Kodi can parse the data and it can be added to the secondary archive
- Curation: Review of the raw library to call for specific info to add to the primary database
- Processing: Editing video to extract the material called for by the curator
- Formatting: Converting and compressing files for inclusion in the primary database
- Tagging: File naming convention, cross linking with image and biographical data in the primary database
Last edited: