Weibo UVE Team serves billions requests everyday, consequently, we have to process billions data everyday.
The primary data sources include:
- Requests log
- Response log
- ADs impression log
- ADs actions(click, like, follow, etc)
- Internal service logs(ADs Bid, ADs rendering, etc)
Data Flow Overview
The following diagram shows the overall data flow of UVE.
Gobblin is used to persist data onto HDFS, on which offline processing is performed.
Thanks to Druid, realtime data analysis become much more straightforward and flexible.
Beside realtime analysis, we have to do some more complex offline processing and analysis. Most of the offline processing is based on Hadoop MapReduce, Hive, HBase, etc.
Monitoring is also very important. We have to discover what’s going wrong when a new feature is released in realtime, and we want to be notified when some accident occurred or the impression is not as expected.
Graphite makes our work much easier for realtime monitoring. Based on Graphite, we could easily monitor our data horizontally and vertically. The horizontal means compare current data with historical data, and vertical means compare data from different sources at the same time point, as the following diagram shows.