Multi-View  Datasets
Version: 2021-12-23
---------------------------------------------
This archive contains the Football datasets data used in the paper "Producing a Unified Graph Representation from Multiple Social Network Views". D. Greene, P. Cunningham (2013). "

For more details visit http://mlg.ucd.ie/aggregation/index.html

This archive includes five datasets (football, olympics, politics-ie, politics-uk, rugby) collected from Twitter in 2012. The datasets are made available for further non-commercial and research purposes only. They are provided in pre-processed matrix format only. 

To comply with the Twitter TOS, we do not include any raw tweets or other full text content. Users and user lists are referenced by their unique Twitter IDs, as opposed to full names or screen names.

The datasets are provided in a single archive. Each dataset is contained within its own sub-directory, and 9 different "views" of each dataset are provided in sparse matrix representation. For a dataset <name>, the view files have the following prefixes:

    * <name>-follows, <name>-followedby, <name>-mentions, <name>-mentionedby, <name>-retweets, <name>-retweetedby,  <name>-listmerged500, <name>-lists500, <name>-tweets500 

The formats of the files in each sub-directory are as follows:

    * <name>.ids: List of all users for the dataset <name>, specified as one user ID per line.
    * <name>.communities: The ground truth community information, with one community per line. Community members are specified in terms of their user IDs.
    * <name>-<view>.mtx: Feature-User matrix for the <view> on the dataset the dataset <name>. The users (columns) are specified in terms of their user IDs.
    * <name>-<view>.features: The names or identifiers of the features corresponding to the Feature-User matrix for the <view> on the dataset the dataset <name>, with one feature name per line.

If you find this data useful, we would encourage you to cite the paper above.


This archive contains the Wikipedia dataset data used in the paper "Greene, D., & Cunningham, P. (2013). Producing a unified graph representation from multiple social network views. WebSci."

For more details visit http://www.svcl.ucsd.edu/projects/crossmodal/

The Wikipedia dataset was selected from Wikipedia’s featured articles collection. It consists of 2866 image/text pairs that belong to 10 categories. Each image was represented by a 2296-dimensional feature vector, and the text representation was a 3000-dimensional bag-of-words vector. The dataset was restructured into six views; the first view was the image, and the other views were text. In the downloaded dataset, the text views were represented as a matrix, and therefore, this dataset is considered to be a two-view dataset.


This archive contains the Handwritten dataset data is collected through https://archive.ics.uci.edu/ml/datasets/Multiple+Features.
This dataset contained 2000 images in 10 classes from number 0 to 9. Six types of descriptors were extracted: Pix (view1), Fou (view2), Fac (view3), ZER (view4), KAR (view5), and MOR (view6).


his archive contains the WPascal dataset data used in the paper "Cyrus R, Peter Y, Micah Hodosh.2010. Collecting Image Annotations Using Amazon's Mechanical Turk. NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk."
This dataset includes 1000 images/text pairs belonging to 20 categories, with 50 cases per category. In this dataset, each image is tagged with five sentences. The dataset was restructured into six views; the first view was the image, and the other views were text. The image view was represented by a 1024-dimensional feature vector. The other views were text and were represented by 222-dimensional feature vectors.