Building an alternative data platform

the general idea around this paper is that alternative data platforms need lots of data, and that often aggregates around a specific usecase/dataset (e.g. building materials for a build, demographics for a specific area, financial information for a specific company). however, for companies to leverage that data it needs to be integrated with multiple other data sources (would be good to see what the average number of sources are needed here, qualitatively i think its around at least 5 to be useful). this often leaves a massive challenge for companies to properly configure and manage their etl process to ingest and merge all the different datasets. another issue is that the data becomes stale. however, since most of these alternative datasources are derived from ml/ai processes it can lead to unstable references to the objects.