« Islandora was not properly pointing at https fedora sites | Main | Metadata Field dc.type »

DSPACE mime types for AgEcon ... Very few excel

Below is a list of all the MIME types supported by DSPACE

 bitstream_format_id |           mimetype            |  short_description   |                             description                              | support_level | internal 
---------------------+-------------------------------+----------------------+----------------------------------------------------------------------+---------------+----------
                   3 | application/pdf               | PDF                  | Adobe Portable Document Format                                       |             1 | f
                   1 | application/octet-stream      | Unknown              | Unknown data format                                                  |             0 | f
                   2 | text/plain                    | License              | Item-specific license agreed upon to submission                      |             1 | t
                   4 | text/xml                      | XML                  | Extensible Markup Language                                           |             1 | f
                   5 | text/plain                    | Text                 | Plain Text                                                           |             1 | f
                   6 | text/html                     | HTML                 | Hypertext Markup Language                                            |             1 | f
                   7 | text/css                      | CSS                  | Cascading Style Sheets                                               |             1 | f
                   8 | application/msword            | Microsoft Word       | Microsoft Word                                                       |             1 | f
                   9 | application/vnd.ms-powerpoint | Microsoft Powerpoint | Microsoft Powerpoint                                                 |             1 | f
                  10 | application/vnd.ms-excel      | Microsoft Excel      | Microsoft Excel                                                      |             1 | f
                  11 | application/marc              | MARC                 | Machine-Readable Cataloging records                                  |             1 | f
                  12 | image/jpeg                    | JPEG                 | Joint Photographic Experts Group/JPEG File Interchange Format (JFIF) |             1 | f
                  13 | image/gif                     | GIF                  | Graphics Interchange Format                                          |             1 | f
                  14 | image/png                     | image/png            | Portable Network Graphics                                            |             1 | f
                  15 | image/tiff                    | TIFF                 | Tag Image File Format                                                |             1 | f
                  16 | audio/x-aiff                  | AIFF                 | Audio Interchange File Format                                        |             1 | f
                  17 | audio/basic                   | audio/basic          | Basic Audio                                                          |             1 | f
                  18 | audio/x-wav                   | WAV                  | Broadcase Wave Format                                                |             1 | f
                  19 | video/mpeg                    | MPEG                 | Moving Picture Experts Group                                         |             1 | f
                  20 | text/richtext                 | RTF                  | Rich Text Format                                                     |             1 | f
                  21 | application/vnd.visio         | Microsoft Visio      | Microsoft Visio                                                      |             1 | f
                  22 | application/x-filemaker       | FMP3                 | Filemaker Pro                                                        |             1 | f
                  23 | image/x-ms-bmp                | BMP                  | Microsoft Windows bitmap                                             |             1 | f
                  24 | application/x-photoshop       | Photoshop            | Photoshop                                                            |             1 | f
                  25 | application/postscript        | Postscript           | Postscript Files                                                     |             1 | f
                  26 | video/quicktime               | Video Quicktime      | Video Quicktime                                                      |             1 | f
                  27 | audio/x-mpeg                  | MPEG Audio           | MPEG Audio                                                           |             1 | f
                  28 | application/vnd.ms-project    | Microsoft Project    | Microsoft Project                                                    |             1 | f
                  29 | application/mathematica       | Mathematica          | Mathematica Notebook                                                 |             1 | f
                  30 | application/x-latex           | LateX                | LaTeX document                                                       |             1 | f
                  31 | application/x-tex             | TeX                  | Tex/LateX document                                                   |             1 | f
                  32 | application/x-dvi             | TeX dvi              | TeX dvi format                                                       |             1 | f
                  33 | application/sgml              | SGML                 | SGML application (RFC 1874)                                          |             1 | f
                  34 | application/wordperfect5.1    | WordPerfect          | WordPerfect 5.1 document                                             |             1 | f
                  35 | audio/x-pn-realaudio          | RealAudio            | RealAudio file                                                       |             1 | f
                  36 | image/x-photo-cd              | Photo CD             | Kodak Photo CD image                                                 |             1 | f

A file with the wrong bitstream_format_id

 
handle | bitstream_id | bitstream_format_id |                    name                     | size_bytes |             checksum             | checksum_algorithm | description | user_format_description |                                     source                                      |               internal_id               | deleted | store_number | sequence_id 
--------+--------------+---------------------+---------------------------------------------+------------+----------------------------------+--------------------+-------------+-------------------------+---------------------------------------------------------------------------------+-----------------------------------------+---------+--------------+-------------
 95522  |        74367 |                   1 | Staff Paper P10-8--InSTePP10-04.revised pdf |     313884 | 35f4304e6a0c68e935c09c0469a9e291 | MD5                |             |                         | /dspace/assetstore/dspace-sr/upload/Staff Paper P10-8--InSTePP10-04.revised pdf | 102028865626877833459313413758816463357 | f       |            0 |           2
(1 row)

This is labeled as Unknown, but should be PDF. The line below changed it:
 
dspace_sr=> 
dspace_sr=> UPDATE bitstream SET bitstream_format_id = '3' WHERE bitstream_id = '74367';
UPDATE 1

Distribution of bitstream_format_id

The sql query that pulls only live bitstreams:
[silvi003:~]$ cat cmdMime.sql 
\f ','
\a
\t
\o outputfile.csv
SELECT bitstream_format_id  FROM handle,item, item2bundle,bitstream,bundle2bitstream WHERE  handle.resource_type_id=2 AND handle.resource_id = item2bundle.item_id AND item2bundle.bundle_id=bundle2bitstream.bundle_id AND handle.resource_id=item.item_id AND item.withdrawn='f' AND   bundle2bitstream.bitstream_id = bitstream.bitstream_id AND  bitstream.deleted = 'f'  ;
\o
\q
[silvi003:~]$ psql -U dspace_sr  dspace_sr  < cmdMime.sql
Number count of bitstream_format_id

[silvi003:~]$ cat outputfile.csv | sort | uniq -c | sort -n 
      # bitstream_format_id
      1 1
      2 10
  20602 2
  48265 3

the odd bitstream_format_id

Most of the the bitstreams are PDFs ( bitstream_format_id 3) or liscense (bitstream_format_id 2). There is one Unknown (bitstream_format_id 1) and two excel (bitstream_format_id 10). They are shown below:
bitstream_format_id =1
dspace_sr=> SELECT handle, bitstream.*  FROM handle,item2bundle,bitstream,bundle2bitstream WHERE  handle.resource_type_id=2 AND handle.resource_id = item2bundle.item_id AND item2bundle.bundle_id=bundle2bitstream.bundle_id AND handle.resource_id=item.item_id AND item.withdrawn='f' AND   bundle2bitstream.bitstream_id = bitstream.bitstream_id AND  bitstream.deleted = 'f'  AND bitstream_format_id=1;
NOTICE:  adding missing FROM-clause entry for table "item"
 handle | bitstream_id | bitstream_format_id |                          name                           | size_bytes |             checksum             | checksum_algorithm | description | user_format_description |                                           source                                            |              internal_id               | deleted | store_number | sequence_id 
--------+--------------+---------------------+---------------------------------------------------------+------------+----------------------------------+--------------------+-------------+-------------------------+---------------------------------------------------------------------------------------------+----------------------------------------+---------+--------------+-------------
 62242  |        59248 |                   1 | data appendix jayasinghe Beghin Moschini ajae 9007.xlsx |     141434 | 6f40baf7dd97f784091e69ed8714b837 | MD5                |             |                         | /dspace/assetstore/dspace-sr/upload/data appendix jayasinghe Beghin Moschini ajae 9007.xlsx | 13124464764865665476393448862247227640 | f       |            0 |           3
(1 row)


bitstream_format_id =10
dspace_sr=> SELECT handle, bitstream.*  FROM handle,item2bundle,bitstream,bundle2bitstream WHERE  handle.resource_type_id=2 AND handle.resource_id = item2bundle.item_id AND item2bundle.bundle_id=bundle2bitstream.bundle_id AND handle.resource_id=item.item_id AND item.withdrawn='f' AND   bundle2bitstream.bitstream_id = bitstream.bitstream_id AND  bitstream.deleted = 'f'  AND bitstream_format_id=10;
NOTICE:  adding missing FROM-clause entry for table "item"
 handle | bitstream_id | bitstream_format_id |                  name                   | size_bytes |             checksum             | checksum_algorithm |        description         | user_format_description |                                                         source                                                          |              internal_id               | deleted | store_number | sequence_id 
--------+--------------+---------------------+-----------------------------------------+------------+----------------------------------+--------------------+----------------------------+-------------------------+-------------------------------------------------------------------------------------------------------------------------+----------------------------------------+---------+--------------+-------------
 42187  |        32967 |                  10 | MissouriUseValueCalculationsOct2007.xls |     361472 | cbefd6d4008ba5d97db49d2b9178f89f | MD5                | Excel Spreadsheet          |                         | /dspace/assetstore/dspace-sr/upload/C:\Documents and Settings\Lori\My Documents\MissouriUseValueCalculationsOct2007.xls | 87756269209817911914027269532862968326 | f       |            0 |           3
 92231  |        61062 |                  10 | stpap536.data.zip                       |     741798 | b48a9d5aa21f3d8411230bde4651e4fe | MD5                | Data in zipped Excel files |                         | /dspace/assetstore/dspace-sr/upload/stpap536.data.zip                                                                   | 28095595994115196972466977473167819715 | f       |            0 |           3
(2 rows)


Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)