These days with social media getting so popular, we’re always surprised to see newer areas of application. We came across Frrole, a localised newspaper that is driven purely by tweets relevant to that particular city. Currently, it covers 20 cities globally. It filters out irrelevant conversations and pulls the most popular tweets. After isolating the “popular” tweets, an algorithm analyses it on several factors.
These tweets are then assigned categories such as Events, Sports and Deals similar to a traditional newspaper. This way, Frrole creates a localised social newspaper, giving its users virtually a “brand new way of discovering the social buzz in and about their city”.
Unlike News.me and TweetedTimes, Frrole believes in the individual’s ability to publicly announce useful news. At the same time, it also believes in the audience’s ability to measure its relevance and importance. Irrespective of whether the source is the most influential media corporation or the guy next door, if people around find the information relevant, the relevant tweet is automatically promoted to the top, in real-time, irrespective of where it originated.
Frrole is a derivative of a word in Punjabi that translates into “play around and explore”. The guys behind Frrole are a small team at Ciafo, a Bangalore-based startup. The Frrole team itself consists of three people. Amarpreet Kalkat plays the product lead, taking care of product design and management. He also doubles up as the front-end developer. Two other teammates, work part time (stealth mode). In spite of that, they have put together some really beautiful and efficient code and a really huge part of the credit for getting Frrole to beta stage goes to them.
According to Amarpreet, “We wanted to build ‘real-time social information streams’ into our travel portalTravelomy. We were surprised at not finding any ready made localised streams, so we just decided to build one of our own.”
He adds, “as we started looking into the data available from the social web, it started becoming clear that while the amount of information available online through social media sources was increasing at a tremendous pace, there were hardly any good products that were able to make good sense of it. It was also clear, that the collective knowledge of the crowd is more than the knowledge of any professional group, but the signals were getting so mixed up with noise that nobody really was able to make much sense of it.” Amarpreet started with Twitter and today, Frrole analyses close to 5 million tweets everyday, using one slice of a server and covering 20 cities.
Building Frrole has been a tremendous learning exercise according to Amarpreet. “We are talking of a text analytics system that involves big data, is real-time and has critical external dependencies; and each of these areas has its own challenges,” he adds. Another big challenge was putting in place a stable integration with Twitter that can run tens of parallel data feeds and later scale up to hundreds. While it was not easy and took up a lot of fine tuning, they now have a twitter integration that (mostly) works stably. As they increase the number of cities, the scale would change, and would probably need to talk to Twitter about access at a larger scale.
The biggest challenge was probably the fact that social media suffers from a skewed noise to signal ratio. Probably 90% to 10%, or maybe worse. So building an application that could filter out most of the noise and make sense of all the information is like finding a needle in a haystack. Frrole tackles this problem by employing a sturdy set of data reducing and processing algorithms that they are constantly improving. It successfully reduces the processing load by a huge ratio without losing anything statistically significant and finally provides an output that is 0.2 – 0.5% of the initial input. At an accuracy that exceeds 90%, Frrole provides highly accurate results for a completely algorithmic system. Content metadata analysis helps them weed out irrelevant data across multiple levels that could relate to people, places, languages, popularity etc. Once that is done, language processing takes over and does the rest of the work in the later stages.
Frrole currently processes close to 5 million tweets per day for data pulled for 20 cities. The beauty is that it manages to do so using just one slice of a server hosted on Rackspace Cloud, costing $10 a month. While further scaling would not exactly be linear, they expect to be able to cover the English-speaking world at a budget that would at max be a few hundred dollars.
Frrole uses a combination of Java and LAMP stacks. The back end consists of Java based web services and the front end is LAMP based. Both the front end and back end modules are hosted separately and run their own MySQL DBs and caching layers. While the back end has a single layer of memcaching, the front end employs both memcaching and DB caching. Back end also uses Spring Security Framework for specific modules.
Frrole architecture is highly modularized and employs some key concepts of SOA. Having such a modularized architecture helps them build redundancy and fail-safe mechanisms into the system which are critical for real-time systems. While the front end and the back end are completely modular and use only specific web service interfaces to communicate, the back end itself runs the processing engine separately from the imports, servicing or management modules.
From a design perspective, a key challenge also was the decision between brute forcing data analysis vs. smart data reduction first. It involved making a lot of assumptions, validating them, tweaking them and so on. As of now, the system manages to reduce data almost by a factor of 3, while losing virtually nothing.
Even otherwise, they had a great time constraining themselves of hardware resources, with an objective to make the code efficient to a level that it can extract the maximum out of hardware available, before they built in redundancy and margins into the system.
The team completed the first proof of concept (POC) around the April-May this year. After a crunch phase during the release of Travelomy, they re-started work on Frrole around the last week of July. The private alpha version was released around mid-September, and the public alpha around the first week of October. The public beta now covers 20 cities across the globe.
Geographical scaling is big on our agenda. Frrole currently processes close to 5 million tweets every day spanning 20 cities; our goal is to scale up to cover 100 cities and 20 million tweets in the next 6 months. It is a stretch goal, but that is what makes it all the more fun.
They would also continue to improve their algorithms and expand the list of categories they cover. Especially the algorithms, as it is really a never-ending exercise. The key challenge is in being so much ahead of the curve that you can always give wow to your users, but that is much easier said than done. Google Search is a great example of an already good product that just keeps getting better; and Amarpreet hopes to pick a leaf or two from their book.
Frrole is also going to open up their APIs to third-party developers/teams by January. Right now, they’re looking at doing a private alpha that helps them get early feedback to improve the APIs. In return, they would be providing free subscription for a sizeable period, letting those teams play around and see if their service can add value to their products.