ACSANet

Welcome to the ACSA Complex Networks Dataset Collection

cs.upt.ro/~alext/acsanet

logo_upt

The ACSANet repository was created on September 30th 2014 and is progressively growing as a result of the research of ACSA Research Group which pursuits the data mining and analysis of various social networks and beyond. All the datasets available on this page are collected and used for the purposes of our research.

We offer the data to any interested researchers and encourage you to cite the corresponding publications or the repository itself.

Facebook Ego-networks

We have collected a number of Facebook ego-networks extracted using the 3rd party application netvizz (no longer available online due to changes in Facebook's privacy policy). The networks are anonymized and represent the friendship network of a user: a list of his friends (node list) and the connections between his friends (edge list) in the gdf text format (can be opened in Gephi). The gdf file format definition can be found here.

Node definition

nodedef>name VARCHAR,label VARCHAR,sex VARCHAR,locale VARCHAR,agerank INT

1,ID,female,en_US,204

2,SB,male,ro_RO,203

3,PA,male,en_US,202

Edge definition

edgedef>node1 VARCHAR,node2 VARCHAR

1,5

1,9

2,4

The data is free to use and can be downloaded here as a ZIP archive. Please cite the ACSANet respository or our paper in which the data was used:

Bibtex

@incollection{topirceanu2014genetically, title={Genetically Optimized Realistic Social Network Topology Inspired by Facebook}, author={Topirceanu, Alexandru and Udrescu, Mihai and Vladutiu, Mircea}, booktitle={Online Social Media Analysis and Visualization}, pages={163--179}, year={2014}, publisher={Springer} }

APA

Topirceanu, A., Udrescu, M., & Vladutiu, M. (2014). Genetically Optimized Realistic Social Network Topology Inspired by Facebook. In Online Social Media Analysis and Visualization (pp. 163-179). Springer International Publishing.

MuSeNet: Musical Artists Collaboration Network

MuSeNet is a comprehensive collaboration network of musicians extracted from All Music Guide, provided in gml text format. It consists of 19,882 nodes and over 75K edges, with nodes being artists and edges representing a professional collaboration between two artists.

musenet-clusters

Download Gephi project file from here: musenet-limited-with-filters.gephi (3.4MB). Open the file in Gephi and apply the provided giant component filter to keep only artists with professional collaborations (6951 nodes, 41197 edges). Use the ForceAtlas2, Yifan Hu multilevel or Open ORD layouts for visualiztion. For a 3D visualization of the network download and install the ForceAtlas3D plugin.

Bibtex

@inproceedings{topirceanu2014musenet, title={MuSeNet: Collaboration in the Music Artists Industry}, author={Topirceanu, Alexandru and Barina, Gabriel and Udrescu, Mihai}, booktitle={Network Intelligence Conference (ENIC), 2014 European}, pages={89--94}, year={2014}, organization={IEEE} }

APA

Topirceanu, A., Barina, G., & Udrescu, M. (2014, September). MuSeNet: Collaboration in the Music Artists Industry. In Network Intelligence Conference (ENIC), 2014 European (pp. 89-94). IEEE.

FMNet: The Fashion Model Dataset

FMNet a comprehensive database of female models extracted from Fashion Model Directory (FMD) provided in text text format. FMD is the largest online database of female fashion models, designers, fashion brands, magazines and editorials. Our research has aimed at collecting all the available data from FMD and model a physical similarity networks as well as a collaboration network. The first consists of edges connecting two models (nodes) if they share at least one physical trait from: eye color, hair color, nationality, age and height (cm). The latter consists of edges connecting two models (nodes) if they have worked for the same agency.

fmnet_physical
fmnet_agencies

Download Gephi project file from here: fmdb_physical.gephi (3.1MB). Open the file in Gephi and use the ForceAtlas2, Yifan Hu multilevel or Open ORD layouts for visualiztion. For a 3D visualization of the network download and install the ForceAtlas3D plugin. The raw database with all models can be downloaded here as a TXT file. The database format is the following:

Node definition

Frida Aasen

Nationality:norwegian

Birth_Year:1995

Birth_Place:norway

Hair:blonde

Eyes:blue

Height:176

Agency:women management - new york

Agency:modelwerk

Agency:select model management

Agency:scoop models - copenhagen

Agency:women management - milan

Agency:women management - paris

*

Adriana Abascal

Nationality:mexican

Birth_Year:1970

Birth_Place:veracruz, mexico

Hair:light brown

Eyes:brown

Height:175

Agency:view management - spain

Advertisement:diet coke

Advertisement:suarez jewelry

Cover:grazia

Cover:yo dona

Cover:telva

Cover:vanity fair

*

...

The data is free to use, but please cite the ACSANet respository or our paper in which the data was used.

Bibtex

@inproceedings{topirceanu2015fmnet, title={FMNet: Physical Trait Patterns in the Fashion World}, author={Topirceanu, Alexandru and Udrescu, Mihai}, booktitle={Network Intelligence Conference (ENIC), 2015 Second European}, pages={25--32}, year={2015}, organization={IEEE} }

APA

Topirceanu, A., & Udrescu, M. (2015, September). FMNet: Physical Trait Patterns in the Fashion World. In Network Intelligence Conference (ENIC), 2015 Second European (pp. 25-32). IEEE.

Social Cities: urban road networks

Social Cities is a research direction of the ACSA lab through which we want to analyze and bring topological traffic optimizations in urban areas. To this end, we model intersections connected through streets by using complex networks and measure representative and recurrent patterns with the help of network motifs (Alon, Nature 2007). Through the unique and characteristic distribution of motifs (sizes 3-6) in road networks, we are able to compare cities in terms of how "social" they are. In our vision, a more social-inclined city exhibits a more uniform traffic distributions, lesser hot-spots, less delay caused by traffic. Our current work is directed towards analyzing multiple empirical datasets and understanding which topological factors can be optimized to reduce the negative effects of intense urban traffic.

You may download an archive of Gephi projects containing the road networks of: Augsburg (D), Braila (Ro), Bratislava (Sk), Budapest (Hu), Cluj-Napoca (Ro), Constanta (Ro) and Timisoara (Tm). If you use this data, please cite our research paper:

Bibtex

@inproceedings{topirceanu2014social, title={Social cities: Quality assessment of road infrastructures using a network motif approach}, author={Topirceanu, Alexandru and Iovanovici, Alexandru and Udrescu, Mihai and Vladutiu, Mircea}, booktitle={System Theory, Control and Computing (ICSTCC), 2014 18th International Conference}, pages={803--808}, year={2014}, organization={IEEE} }

APA

Topirceanu, A., Iovanovici, A., Udrescu, M., & Vladutiu, M. (2014, October). Social cities: Quality assessment of road infrastructures using a network motif approach. In System Theory, Control and Computing (ICSTCC), 2014 18th International Conference (pp. 803-808). IEEE.

UPT.social: a new online social network

UPT.social is a young new online social network built on top of diaspora* and dedicated to the students and academic staff of the Faculty of Automation and Computers of Politehnica University Timisoara (UPT) in Romania. The provided datasets are snapshots of the network at moments day 0 (pre-launch), day 3, day 7, day 15, day 24 and day 44. The starting size of the network is 239 nodes (129 edges) in D0, and the final size is 351 nodes ( 2126 edges) in D44, but the network is still online and growing. The six provided files contain anonymized directed unweighted edge data regarding which who is following who. The first hashed value represents the id of the source node following the second hashed user id. Thus, the files can be directly imported in Gephi as a directed graph and the corresponding nodes will be created automatically.

Visualizations of the snapshots from D0, D3 and D44 (from left to right).

Download the Gephi project files from here: upt.social.zip (94KB). Open the file in Gephi and use the ForceAtlas2, Yifan Hu multilevel or Open ORD layouts for visualiztion. For a 3D visualization of the network download and install the ForceAtlas3D plugin. The gdf format is the following:

Node definition

nodedef>name VARCHAR,label VARCHAR

edgedef>node1 VARCHAR,node2 VARCHAR

# hash of source user, hash of target user: represents a directed unweighted edge

356a192b7913b04c54574d18c28d46e6395428ab,77de68daecd823babbb58edb1c8e14d7106e83bb

...

The data is free to use, but please cite the ACSANet respository or our paper in which the data was used:

Bibtex

@inproceedings{topirceanu2016upt, title={UPT. Social: The Growth of a New Online Social Network}, author={Topirceanu, Alexandru and Garcia, Jorge and Udrescu, Mihai}, booktitle={Network Intelligence Conference (ENIC), 2016 Third European}, pages={9--16}, year={2016}, organization={IEEE} }

APA

Topirceanu, A., Garcia, J., & Udrescu, M. (2016, September). UPT. Social: The Growth of a New Online Social Network. In Network Intelligence Conference (ENIC), 2016 Third European (pp. 9-16). IEEE.

Strength of nations: A case study on estimating the influence of leading countries using social media analysis

Event Registry is a free* data aggregation service able to collect news articles from mainstream news media from around the world. Information about the new content is obtained through RSS feeds. The collected content comes from more than 30.000 news sources worldwide. The daily amount of content is between 250.000 - 300.000 news articles. The languages that are represented the most are English, German, Spanish, French, Chinese and Italian.

Each event that is mentioned in the news is usually reported by several news publishers. By analyzing the content of news articles, Event Registry is able to identify groups of articles describing the same event. Event Registry can even find articles in different languages that are describing the same event and treat them as the same entity.

By analyzing the articles assigned to the event, Event Registry is able to extract the main information about the event: where it happened, when it happen, what is the event about, who was involved in the event.

Our study was pinpointed at extracting information about 5 country sets (EU, G8, G8+5, G20 and NATO) based around the publication of events containing any of the 7 #tags: Economics, Finance, GDP, Industry, Military, Politics, Warfare. The results obtained through the Python API are aggregated into 35 (5x7) different graph files available here (25 KB) in gml text format. Each country is represented by a node, and the weighted directed edges between nodes represent the number of mentions of an event from "mentioning country" towards "source country of event". For example, if two news agencies from Austria mention the Formula 1 Grand Prix in Canada, then Austria will have an edge with weight = 2 towards Canada. A higher in-degree means that a country is more mentioned, aka more influential in our view.

An example of measuring common centralities on the EU dataset, based on tag #GDP is displayed below.

The data is free to use, but please cite the ACSANet respository or our paper in which the data was used:

Bibtex

@inproceedings{topirceanu2017strength, title={Strength of Nations: A Case Study on Estimating the Influence of Leading Countries Using Social Media Analysis}, author={Top{\^\i}rceanu, Alexandru and Udrescu, Mihai}, booktitle={European Network Intelligence Conference}, pages={219--229}, year={2017}, organization={Springer} }

APA

Topîrceanu, A., & Udrescu, M. (2017, September). Strength of Nations: A Case Study on Estimating the Influence of Leading Countries Using Social Media Analysis. In European Network Intelligence Conference (pp. 219-229). Springer, Cham.

SASscore: targeting high-specificity for efficient population-wide monitoring of obstructive sleep apnea

In this project we investigate a novel screening tool for patients suffering from Obstructive Sleep Apnea Syndrome (OSAS), which aims at efficient population-wide monitoring. As such, we present SASscore which provides better specificity while maintaining a high sensitivity.

We process a cohort of 2595 patients from 4 sleep laboratories in Western Romania, by recording over 100 sleep, breathing, and anthropometric measurements per patient; using this data, we compare our SASscore with state of the art scores STOP-Bang and NoSAS through: area under curve (AUC), sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV). We also evaluate the performance of SASscore by considering different Apnea–Hypopnea Index (AHI) diagnosis cut-off points, and show that custom refinements are possible by changing the score's cut-off threshold.

Our SASscore takes decimal values within the interval (2, 7) and has a linear relationship with AHI; it is based on normalizing measures for BMI, neck circumference, systolic blood pressure and Epworth score. By applying the STOP-Bang and NoSAS questionnaires, as well as the SASscore on the patient cohort, we obtain the respective AUC values of 0.69 (95% CI 0.66-0.73, p<0.001), 0.66 (95% CI 0.63-0.68, p<0.001), and 0.73 (95% CI 0.71-0.75, p<0.001), with sensitivities of 0.968, 0.901, 0.829, and specificity values of 0.149, 0.294, 0.359, respectively. When raising the SASscore's diagnosis cut-off from 3 to 3.7, both sensitivity and specificity become roughly 0.6.

With respect to existing scores, SASscore is a more appropriate screening tool for monitoring large populations, due to its much higher specificity. Moreover, our score can be tailored to increase either sensitivity or specificity, while maintaining the AUC value in an optimal balance.

The anonymized and encrypted "WestRo" cohort (N=2595) can be downloaded here: http://staff.cs.upt.ro/~alext/acsanet/sasscore/ For a password, please email us (alext [at] cs [.] upt [.] ro) describing your needs and usage of the dataset.

If using the data or our SAS score please cite the ACSA net repository or our paper:

Bibtex

@article{topirceanu2018sas, title={SAS score: Targeting high-specificity for efficient population-wide monitoring of obstructive sleep apnea}, author={Top{\^\i}rceanu, Alexandru and Udrescu, Mihai and Udrescu, Lucre{\c{t}}ia and Ardelean, Carmen and Dan, Rodica and Reisz, Daniela and Mihaicuta, Stefan}, journal={PloS one}, volume={13}, number={9}, pages={e0202042}, year={2018}, publisher={Public Library of Science San Francisco, CA USA} }

APA

Topîrceanu, A., Udrescu, M., Udrescu, L., Ardelean, C., Dan, R., Reisz, D., & Mihaicuta, S. (2018). SAS score: Targeting high-specificity for efficient population-wide monitoring of obstructive sleep apnea. PloS one, 13(9), e0202042.

Citing ACSANet

Use the following BibTeX citation for the ACSANet library:

Bibtex

@ELECTRONIC{top2014acsanet, author = {Alexandru Topirceanu and Mihai Udrescu}, month = {September}, year = {2014}, title = {{ACSANet}: {ACSA} Complex Networks Dataset Collection}, url = {http://cs.upt.ro/~alext/acsanet} }

APA

Topirceanu, A., Udrescu, M. (2014, September). ACSANet: ACSA Complex Networks Dataset Collection. http://cs.upt.ro/~alext/acsanet