Below is a list of all the MIME types supported by DSPACE
bitstream_format_id | mimetype | short_description | description | support_level | internal
---------------------+-------------------------------+----------------------+----------------------------------------------------------------------+---------------+----------
3 | application/pdf | PDF | Adobe Portable Document Format | 1 | f
1 | application/octet-stream | Unknown | Unknown data format | 0 | f
2 | text/plain | License | Item-specific license agreed upon to submission | 1 | t
4 | text/xml | XML | Extensible Markup Language | 1 | f
5 | text/plain | Text | Plain Text | 1 | f
6 | text/html | HTML | Hypertext Markup Language | 1 | f
7 | text/css | CSS | Cascading Style Sheets | 1 | f
8 | application/msword | Microsoft Word | Microsoft Word | 1 | f
9 | application/vnd.ms-powerpoint | Microsoft Powerpoint | Microsoft Powerpoint | 1 | f
10 | application/vnd.ms-excel | Microsoft Excel | Microsoft Excel | 1 | f
11 | application/marc | MARC | Machine-Readable Cataloging records | 1 | f
12 | image/jpeg | JPEG | Joint Photographic Experts Group/JPEG File Interchange Format (JFIF) | 1 | f
13 | image/gif | GIF | Graphics Interchange Format | 1 | f
14 | image/png | image/png | Portable Network Graphics | 1 | f
15 | image/tiff | TIFF | Tag Image File Format | 1 | f
16 | audio/x-aiff | AIFF | Audio Interchange File Format | 1 | f
17 | audio/basic | audio/basic | Basic Audio | 1 | f
18 | audio/x-wav | WAV | Broadcase Wave Format | 1 | f
19 | video/mpeg | MPEG | Moving Picture Experts Group | 1 | f
20 | text/richtext | RTF | Rich Text Format | 1 | f
21 | application/vnd.visio | Microsoft Visio | Microsoft Visio | 1 | f
22 | application/x-filemaker | FMP3 | Filemaker Pro | 1 | f
23 | image/x-ms-bmp | BMP | Microsoft Windows bitmap | 1 | f
24 | application/x-photoshop | Photoshop | Photoshop | 1 | f
25 | application/postscript | Postscript | Postscript Files | 1 | f
26 | video/quicktime | Video Quicktime | Video Quicktime | 1 | f
27 | audio/x-mpeg | MPEG Audio | MPEG Audio | 1 | f
28 | application/vnd.ms-project | Microsoft Project | Microsoft Project | 1 | f
29 | application/mathematica | Mathematica | Mathematica Notebook | 1 | f
30 | application/x-latex | LateX | LaTeX document | 1 | f
31 | application/x-tex | TeX | Tex/LateX document | 1 | f
32 | application/x-dvi | TeX dvi | TeX dvi format | 1 | f
33 | application/sgml | SGML | SGML application (RFC 1874) | 1 | f
34 | application/wordperfect5.1 | WordPerfect | WordPerfect 5.1 document | 1 | f
35 | audio/x-pn-realaudio | RealAudio | RealAudio file | 1 | f
36 | image/x-photo-cd | Photo CD | Kodak Photo CD image | 1 | f
A file with the wrong bitstream_format_id
handle | bitstream_id | bitstream_format_id | name | size_bytes | checksum | checksum_algorithm | description | user_format_description | source | internal_id | deleted | store_number | sequence_id
--------+--------------+---------------------+---------------------------------------------+------------+----------------------------------+--------------------+-------------+-------------------------+---------------------------------------------------------------------------------+-----------------------------------------+---------+--------------+-------------
95522 | 74367 | 1 | Staff Paper P10-8--InSTePP10-04.revised pdf | 313884 | 35f4304e6a0c68e935c09c0469a9e291 | MD5 | | | /dspace/assetstore/dspace-sr/upload/Staff Paper P10-8--InSTePP10-04.revised pdf | 102028865626877833459313413758816463357 | f | 0 | 2
(1 row)
This is labeled as Unknown, but should be PDF. The line below changed it:
dspace_sr=>
dspace_sr=> UPDATE bitstream SET bitstream_format_id = '3' WHERE bitstream_id = '74367';
UPDATE 1
Distribution of bitstream_format_id
The sql query that pulls only live bitstreams:
[silvi003:~]$ cat cmdMime.sql
\f ','
\a
\t
\o outputfile.csv
SELECT bitstream_format_id FROM handle,item, item2bundle,bitstream,bundle2bitstream WHERE handle.resource_type_id=2 AND handle.resource_id = item2bundle.item_id AND item2bundle.bundle_id=bundle2bitstream.bundle_id AND handle.resource_id=item.item_id AND item.withdrawn='f' AND bundle2bitstream.bitstream_id = bitstream.bitstream_id AND bitstream.deleted = 'f' ;
\o
\q
[silvi003:~]$ psql -U dspace_sr dspace_sr < cmdMime.sql
Number count of bitstream_format_id
[silvi003:~]$ cat outputfile.csv | sort | uniq -c | sort -n
# bitstream_format_id
1 1
2 10
20602 2
48265 3
the odd bitstream_format_id
Most of the the bitstreams are PDFs ( bitstream_format_id 3) or liscense (bitstream_format_id 2). There is one Unknown (bitstream_format_id 1) and two excel (bitstream_format_id 10). They are shown below:
bitstream_format_id =1
dspace_sr=> SELECT handle, bitstream.* FROM handle,item2bundle,bitstream,bundle2bitstream WHERE handle.resource_type_id=2 AND handle.resource_id = item2bundle.item_id AND item2bundle.bundle_id=bundle2bitstream.bundle_id AND handle.resource_id=item.item_id AND item.withdrawn='f' AND bundle2bitstream.bitstream_id = bitstream.bitstream_id AND bitstream.deleted = 'f' AND bitstream_format_id=1;
NOTICE: adding missing FROM-clause entry for table "item"
handle | bitstream_id | bitstream_format_id | name | size_bytes | checksum | checksum_algorithm | description | user_format_description | source | internal_id | deleted | store_number | sequence_id
--------+--------------+---------------------+---------------------------------------------------------+------------+----------------------------------+--------------------+-------------+-------------------------+---------------------------------------------------------------------------------------------+----------------------------------------+---------+--------------+-------------
62242 | 59248 | 1 | data appendix jayasinghe Beghin Moschini ajae 9007.xlsx | 141434 | 6f40baf7dd97f784091e69ed8714b837 | MD5 | | | /dspace/assetstore/dspace-sr/upload/data appendix jayasinghe Beghin Moschini ajae 9007.xlsx | 13124464764865665476393448862247227640 | f | 0 | 3
(1 row)
bitstream_format_id =10
dspace_sr=> SELECT handle, bitstream.* FROM handle,item2bundle,bitstream,bundle2bitstream WHERE handle.resource_type_id=2 AND handle.resource_id = item2bundle.item_id AND item2bundle.bundle_id=bundle2bitstream.bundle_id AND handle.resource_id=item.item_id AND item.withdrawn='f' AND bundle2bitstream.bitstream_id = bitstream.bitstream_id AND bitstream.deleted = 'f' AND bitstream_format_id=10;
NOTICE: adding missing FROM-clause entry for table "item"
handle | bitstream_id | bitstream_format_id | name | size_bytes | checksum | checksum_algorithm | description | user_format_description | source | internal_id | deleted | store_number | sequence_id
--------+--------------+---------------------+-----------------------------------------+------------+----------------------------------+--------------------+----------------------------+-------------------------+-------------------------------------------------------------------------------------------------------------------------+----------------------------------------+---------+--------------+-------------
42187 | 32967 | 10 | MissouriUseValueCalculationsOct2007.xls | 361472 | cbefd6d4008ba5d97db49d2b9178f89f | MD5 | Excel Spreadsheet | | /dspace/assetstore/dspace-sr/upload/C:\Documents and Settings\Lori\My Documents\MissouriUseValueCalculationsOct2007.xls | 87756269209817911914027269532862968326 | f | 0 | 3
92231 | 61062 | 10 | stpap536.data.zip | 741798 | b48a9d5aa21f3d8411230bde4651e4fe | MD5 | Data in zipped Excel files | | /dspace/assetstore/dspace-sr/upload/stpap536.data.zip | 28095595994115196972466977473167819715 | f | 0 | 3
(2 rows)