Changing the research

Due to the problems which came up yesterday, we will most likely change our data!

We don’t know how to fix the dataset in order to make Leximancer understand the tweets. It seems to be impossible for Leximancer to understand the “twitter language”.

Our backup plan is now to stick with Leximancer but use newspaper articles which will be collected tomorrow after we consulted with Philipp.

In case we start the new research, we will use google.com and its search function “news” to select the newsarticles. The keyword will be “Mali” and everyone of us picks a day (21.3.-24.3.) and creates a textdocument (zipped) with 100 articles each.

We kindly ask Philipp to come to our group tomorrow first, because it really seems as if we have to start over!

 

8 Comments to “Changing the research”

  1. Philipp Babcicky 8 April 2012 at 4:00 pm #

    Hi Karo, thanks for the update. What’s the problem in the data? There might be a chance to filter/clean that data before I run the export. Two things:

    1.) No worries, I’ll come to your group first tomorrow morning.

    2.) Research on Twitter-discourse has already been done with Leximancer… How did they deal with the data?

    See you tomorrow,
    Philipp

  2. Karo 8 April 2012 at 4:21 pm #

    Hi Philipp,
    The main problem is that Leximancer does not understand twitter language.The tweets consists of abbreviations etc so that the concepts don’t make any sense even when we remove small words.
    Rine wrote a long message on this stuff yesterday (see our category “problems”)!
    We will show you tomorrow what we exactly mean!
    Thanks for stopping by in our group first!
    See you tomorrow!

  3. Philipp Babcicky 8 April 2012 at 4:27 pm #

    Alright… Still, specific words such as “RT” can easily be removed in a stop-list, right?

  4. Robin 8 April 2012 at 7:56 pm #

    We will show it to you on Monday. The Problem is not a specific word. It is the way people talk on twitter “nerd english”. They don’t use sentences.

  5. Robin 8 April 2012 at 8:04 pm #

    example: “#Mali… wtf… puhhh…”

    1. all the dots make sentences out of it
    2. the # confuses (but that can be solved)
    3. puhhhh is not a word

    etc etc…
    tank u for stepping by!

    (“u” also not working for Leximancer ;-) )

  6. Rine 8 April 2012 at 8:05 pm #

    the problems are:
    first the language. even though we have uploaded stop-lists in all the three languages that are used constantly (English, french and spanish), it does not take away small words (like y, la, si) in other languages than english. the way it seems to me and chica is that when you upload text, you are only allowd to choose one language for it, not three. consequently, leximancer can understand it if all the text is in french, or all in spanish, but it can not understand a mix of it.

    second: twitter language..
    also, the stop-list removes predefined words, such as and, the, then, etc, with low lexical weight. it does NOT remove words that dont appear in dictionaries, and the majority of tweets are written like “New #Video: @WheresAndrew (literally) dancing in #Malawi http://on.natgeo.com/HYrMC3“… in this tweet, even though it is in english and we have a stop list, leximancer would only understand literally and dancing, and it is not able to make sence out of it.
    this is very obvious in our concept map, as even if we “kill” all the spanish and french concepts, there is still not any connection between the english concepts, because the tweet-words dont appear in actual sentences. another problem is that leximancer automatically creates blocks of text that are two sentences long, but since the majority of tweets are only one sentence, it links two tweets together as if it was the same text. these issues means that leximancer creates meaningless themes, that are not reflected in the actual text.
    I also tried to set the prose threshold (which filters out all words that are not in the dictionary) really high. however, then we loose most of the text, as there is very little left in tweets that are real words.
    when I have read other sources that have used leximancer to analyse tweets, they do not say anything about how they did it. however, it becomes clear that they did not take random tweets, but for example made a school class tweet about a specific topic for a certain time, and then extracted them. and I would assume that when you are writing for your professor about a specific topic, you would maybe write sentences that give more meaning.

    This is a bit difficult to explain over a post, but after MANY hours of trying, in has become very clear that it is totally unrealistic of us to continue like this. even though we filter the language manually, the majority of the problems wont disappear.
    it is also abvious that the analysis we are left with, no matter how we do it, is not good enough, and is impossible to interpret in a peace-journalistic perspective.
    however, we discuss it more in the morning, and i think it will become clear. right now, we are conducting a pre-test of the news back-up plan, to make sure it could actually work.

  7. Philipp Babcicky 8 April 2012 at 8:42 pm #

    Thanks for the update… Claire and Rine!

    Claire:

    ad 1 & 3: the same thing is happening in the blogosphere and analyses are conducted though…. so, that’s not a Twitter-specific phenomenon.

    ad 2: can be solved, as you said.

    However, if you want to work on the “Back-Up Plan”, that’s fine too…

    Good night and see you all tomorrow morning,
    Philipp

  8. Robin 9 April 2012 at 7:14 am #

    Ok. I didn’t like giving up the tweets, so if we acted to fast … I don’t know… let’s talk on Monday, maybe we don’t need to change, but at least we got a plan B now.


Leave a Reply

You must be logged in to post a comment.