Explore your Social Circle, improve your Research

Everybody who has a Twitter account and wants to register in Grabeeter can now try out and take part in the evaluation of the “Researcher Affinity Browser“. Grabeeter is a tool to archive and search your tweets. The Researcher Affinity Browser is one of the first to expose affinities between Twitter users and is the first web application built on top of the semantic profiling framework. It is intended for researchers who are using Twitter for microblogging and are tweeting about their research, their interests or the conferences they are attending or tracking.

Researcher Affinity Browser

Researcher Affinity Browser

The semantic profiling framework grew throughout the last year as my thesis project and information about the evolutions was presented here on this blog.

So if you are a researcher and are using Twitter, you are most welcome to explore people who have already registered themselves in Grabeeter. You can register from inside the application or here. After registration you have to wait a few hours before your data is analysed and suggestions can be made for you. Once you are registered you can start exploring people by selecting your username.

Progress Report 4

This report is a presentation. I presented it today for members of the research group and fellow thesis students at the KULeuven.

Scientific Profiling Presentation

It explains the evaluation approach for the framework. The term “affinity” is being introduced for the first time. Before terms like shared resources or common entitities and interests were used.

“Affinities” expresses much better the user-centric perspective and the fact that it is a subjective notion. It also means that it is not only linked to a certain person but is also time-sensitive, something that could be called a “user context”.

 

Related articles

Constructing Experts Profiles from Linked Open Data

Inspired from Linked Open Data (LOD) initiative, Latif et al. have developed a tool which can establish links between authors of digital journals with relevant semantic resources available in LOD [1]. The proposed system is able to disambiguate authors and can: 1) locate, 2) retrieve, and 3) structure the relevant semantic resources. Furthermore, the system constructs comprehensive aspect oriented authors’ profiles from heterogeneous datasets of LOD on the fly. They investigated the potentials of such an approach on a digital journal known as Journal of Universal Computer Science (J.UCS). It is their strong belief that this kind of applications can motivate researchers and developers to investigate different application areas where Linked Open Data can contribute, bring added value, and can take the idea of open access further.

Because of the strong resemblance to our application, some interesting aspects of their approach will be used in this project.

The emergence of many semantically rich and structured datasets from Linked Open Data movement (LOD) can facilitate in more controlled search and fruitful results. Latif et al. employed an automatic technique to find the required information about experts using LOD dataset. The expert profile is discovered, aggregated, clustered, structured, and visualized to the administration of peer-review system.

Latif et al. designed a system divided in to three layers which interact with each other to make the system operational: Expertise Calculation, Visualization, Locating and Construction of Expert Profile. In their paper [2], they proposed and implemented an automatic technique for discovering this information. The system uses Linked Data paradigm for acquiring semantically rich information. The proposed set of heuristics was able to disambiguate experts, in acquiring relevant information, and structuring the information to produce a coherent view of the expert.

Their system has been implemented for a journal such as Journal of Universal Computer Science (J.UCS). The proposed system is useful for the J.UCS administration to assign reviewing duties by presenting a comprehensive expert profile. The proposed system is useful for the J.UCS administration to assign reviewing duties by presenting a comprehensive expert profile.

The linking as proposed by the authors is helpful for different scenarios e.g.: for users who are searching research collaborators, for journal administration who want to assign new reviewers and for users who want to explore experts to seek guidance. A comprehensive profile of an author was structured and visualized at one place providing various opportunities for collaborations. This is helpful in getting deep insights of author’s work, personal and professional life.

We will base our user interface on their automatic method of expert profile construction.

 

References

  1. Latif A., Afzal M.T., Helic D., Tochtermann K., Hermann Maurer H., Discovery and Construction of Authors’ Profile from Linked Data (A case study for Open Digital Journal). coronet.iicm.edu (2010)
  2. Latif A., Afzal M.T., Tochtermann K., Constructing experts profiles from Linked Open Data. Emerging Technologies. (ICET)

 

Open Innovation Problem Solver Search

This project presents a case in which technical scientists are linked to each other. The intention is to connect scientists that have similar interests. In “Open Innovation” experts of different institutions and companies try to collaborate and increase the rate of technological innovation. M. Stankovic wrote a paper [1] to check if linked data contributes in these efforts.

According to this paper of M. Stankovic there are many potential sources of evidence about users’ interest and expertise (e.g., research papers, blogs, activities) are becoming ubiquitously present as Linked Data. In their paper they presented a research effort for suggesting the right way to search for potential Open Innovation problem solvers in Linked Data sources, by looking at the structure of available data sources. In addition, the author sought to develop ways of suggesting domains of expertise that are in some way relevant to the domain of the Open Innovation problem, in order to enable a cross-domain solution transfer.

Read more »

Can linked data assist in expert profiling?

Scientific profiling in social networks involves the determination of a canditate’s (user) generated content. To determine if this content (in this case the microblogs) have scientific relevance, thus if a twitter user is an expert in a certain domain, we link hashtags to the linked data cloud. Specifically we try to discover scientific conferences, locations, people and events. In the literature we found an important validation for this idea. The general conclusion is that there are sources available to build such a system. But they are not properly interlinked. This thesis project is an effort to provide the interlinking between several LOD sources (most importantly Colinda, GeoNames and DBPedia). Other resources can definitely enhance the possibilities of the framework. But to prove the case we strictly limit the effort to technical scientific people and we use the hypothesis that if people are attending similar scientific conferences they are a good match.

Stankovic et al. studied expert search and profiling systems. Such systems aim to identify candidate experts and rank them with respect to their estimated expertise on a given topic, using available evidence. The authors found that traditional expert search and profiling systems exploit structured data from closed systems (e.g. email program) or unstructured data from open systems (e.g. the Web). However, on today’s Web, there is a growing number of data sets published according to the Linked Data principals, the majority of them being part of the Linked Open Data (LOD) cloud. As LOD connects data and people across different platforms in a meaningful way, one can assume that expert search and profiling systems would benefit from harnessing LOD.

Stankovic et al. conducted several experiments to evaluate the feasibility of existing expert search and profiling approaches on a recent snapshot of the LOD cloud. Our findings indicate that LOD cloud is already a useful source for some kinds of expert search approaches (e.g., those based on publications and professional events) but still has to meet certain requirements in order to reach its full potential. In the existing literature on expert finding, different authors make different assumptions on what makes an expert and how expertise can be assessed. They called these assumptions expertise hypotheses.

One section presents hypotheses which assume that a user’s online activities related to a certain topic imply his/her expertise in that topic. In order to be a useful evidence source for expert search, the LOD cloud needs to satisfy certain conditions. They have designed the some tests to verify if those conditions are met by the current LOD cloud and conducted these test for each particular expertise hypothesis. In particular in this thesis project 3 hypothesis are very relevant:

  • H4: If a user answers questions (on topic X) from experts on topic X then he might himself be an expert on topic X
  • H5: If a user is among the first to discover (and share) “important/good” resources (i.e. resources which become later popular) on topic X, then he might be an expert on topic X.
  • H6: If a user participates in collaborative software development project then he might be an expert in the programming language that is used in the project.

 

The authors found some interesting linked data resources for H5 the Faviki, Virtuoso (via Sponger) and for H6 the DOAP Store, RDFOhloh. Q&A sites are a useful source of data about expertise, and despite the possibility to represent them using the SIOC ontology, they have not found any such website that provides SIOC-based data export. H4 was thus not applicable on the current LOD cloud.

Faviki is a good example of this issue as well. It provides useful data about tagging with links to DBPedia, but the data about the time of tagging is missing, thus making it difficult to design expert search approaches based on H5. In some cases LOD is not a good source for expert finding because the datasets which may be used by certain hypothesis are not interlinked. During their evaluation they have found some examples of data that would be a useful source for expertise evidence if they would be interlinked.

For approaches based on H6, a useful data source is RDFOhloh – the export of data related to software development projects that take place at Ohloh . This source provides both inverse functional properties for the members of the projects, and links to DBPedia concepts identifying the programming languages that are used. It is thus perfectly suited for finding experts on specific programming languages.

DoapStore is a promising source for H6-based approaches. It contains data on software development projects and their participants. Although the programming language data are present, they are only given in form of literals, and the presence of links to some general concepts (e.g. DBPedia or Freebase ones) is not common. The H6-based approaches may rely on RDFOhloh for a more complete support. RDFOhloh also provides direct links to DoapStore descriptions, thus making the integration possible despite the lack of links in DoapStore.

The authors concluded that expert search and profiling systems aggregate and analyze certain types of data depending on the types of expertise hypotheses they use. Traditional approaches tend to retrieve their data from closed or limited data corpuses. LOD on the other hand allows querying the whole Web like a huge database, thus surpassing the limits of closed data sets, and closed online communities. They believe that this opens new possibilities for traditional expert search and profiling systems which usually only rely on data from their local and limited databases or on unstructured data gathered from the Web. LOD also stands up for a great promise to deliver mutli purpose data that can be used to find experts in many domains and with many different expertise hypotheses. In this paper they have explored the potentials and drawbacks of LOD in comparison to traditional datasources used for expert search. They haven’t only asked the question what LOD can do, but also what one can do for LOD to make it an even better source of expertise evidence.

 

References

  • Stankovic M., Wagner C., Laublet P., Jovanovic J.: Looking for experts? what can linked data do for you. In: Linked Data on the Web.

 

Related articles

Are hashtags a good choice as linked data identifiers?

In this project hashtags are used as the most important identifiers to link users. The tags are considered as good identifiers because the user intends and attaches the hash to engage in a conversation in which others use this hashtag. But does it make sense to conclude that they are also good identifiers in linked data?

In the paper “Making Sense of Twitter“ [1], David Laniado and Peter Mika first took a look at whether hashtags behave as strong identifiers, and thus whether they could serve as identifiers for the Semantic Web. Twitter users have adopted the convention of adding a hash at the beginning of a word to turn it into a hashtag. Hashtags are meant to be identifiers for discussions that revolve around the same topic. When used appropriately, searching on these hashtags would return messages that belong to the same conversation (even if they don’t contain the same keywords), and thereby solving the aggregation prob- lem. Coincidentally, this is the same function that strong identifiers (URIs) play in the Semantic Web. The questions they asked then is which hashtags behave as strong identifiers (if any), and if they could be mapped to concept identifiers in the Semantic Web?According to the authors there are a number of desirable criteria that a hashtag should fulfill in this role, similar to how ‘cool URIs’ are differentiated from poor URIs: frequency, specificity, consistency in usage and stability over time.  In line with previous works on the analysis of folksonomy systems [2], they capture the semantics of the hashtags by their usage in the social media system. In particular, they represented the meaning of hashtags using a Vector Space Model (VSM) [3].  For this study they relied on a dataset of 539,432,680 messages, collected over the whole month of November 2009 (about 18 million per day).

In order to assess how well their metrics were able to indicate which hashtags represent stable concepts with a unique identity, they have performed a manual evaluation on a random sample of 257 hashtags. Slightly more than half of the tags (137) could be associated to a Freebase entry; this is higher than the number of named entities because Freebase contains also some general terms. As expected, most application and sentiment tags could not be mapped to Freebase. Only 33% of application and 14% of sentiment tags could be resolved, and many of these mappings are rough approximations of the intended meaning.

Laniado and Mika found that not all hashtags are used in the same way, not all of them aggregate messages around a community or a topic, not all of them endure in time, and not all of them have an actual meaning. In this work they had addressed the issue of evaluating Twitter hashtags as strong identifiers, as a first step in order to bridge the gap between Twitter and the Semantic Web. The first contribution of this paper stands in the formalization of the problem, and in the elaboration of a number of desired properties for a good hashtag to serve as a URI. Based on these data, they had tested the results obtained with the algorithms described in their paper, showing how a combination of the proposed measures can help in the task of assessing which tags are more likely to represent valuable identifiers. These results are promising, with respect to the perspective of anchoring Twitter hashtags to Semantic Web URIs, and to detect concepts and entities valuable to be treated as new identifiers.

 

References

  1. Laniado, D., Mika. P.: Making sense of twitter. In: The Semantic Web–ISWC 2010 (2010)
  2. Cattuto, C., Benz, D., Hotho, A., Stumme, G.: Semantic grounding of tag related- ness in social bookmarking systems. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 615–631. Springer, Heidelberg (2008)
  3. Raghavan, V.V., Wong, S.K.M.: A critical analysis of vector space model for infor- mation retrieval. Journal of the American Society for Information Science (1986)

 

Related articles

Progress Report 3

This the third report of my Master Thesis project in Computer Sciences at Graz University of Technology (TUGraz) and the Katholieke Universiteit Leuven (KULeuven). It combines the first two reports and adds a description of the research carried out in december and january. It gives a background overview to situate this report and it discusses the problem statement. An in depth view on the software architecture is given. Some noteworthy implementation details are revealed. Finally an updated project plan is motivated.

Master Thesis Progress Report 3

Related articles

Profiling and Discovery API functions for Grabeeter (TUGraz), 1st version

A short user guide on how to use the API for the Semantic Profiling framework (more details are following), please note that the “Profiling” and “Discovery” functions used in step 3 and 4 are under construction. Every time you check the results may differ strongly.

  1. If you have not registered your Twitter account on Grabeeter, do so now!
  2. Wait a few hours until your tweet become available on Grabeeter:
    http://grabeeter.tugraz.at/tweets/<YOUR-NAME>
  3. Check out your analyzed & verified Twitter profile (use the entire italic url!) Hint, Click on the left tab “Viewer”:
    http://jsonviewer.stack.hu/#http://api.semanticprofiling.net/profile.php?user=<YOUR-NAME>
  4. Discover people, who like you registered on Grabeeter, ranked by number of common links (be patient, this could take a few minutes):
    http://jsonviewer.stack.hu/#http://api.semanticprofiling.net/discovery.php?find=persons&user=<YOUR-NAME>
Example of discovery: by common tags & friends

Example of discovery: by common tags & friends

Related articles

Real Time Interlinking of Tweets (API)

In the previous post, I described a demonstration of a HTTP Stream with realtime annotated tweets. We improved the annotation by trying to identify concepts and linking them to resources in the LOD cloud. For example DBPedia, GeoNames and FreeBase.

http://socialweb.semanticprofiling.net/client

Application developers can also combine the realtime client with the SPARQL endpoint located at:

http://socialweb.semanticprofiling.net/endpoint.php

Direct acces to the HTTP stream (Output in NTriples):

http://socialweb.semanticprofiling.net/client/provider.php?q=<KEYWORD>

Related articles

Realtime Social Semantic Web

In another similar project we are trying to bring realtime RDF annotation to the Social Web!We have quite successfully implemented a basic client that uses the Twitter Streaming API filter to track tweets containing certain keywords. Anyway you can try it herehttp://socialweb.semanticprofiling.net/client/

Realtime Social Semantic Web Effort Client

Realtime Social Semantic Web Effort Client