Adatintenzív megközelítés a tudományokban

Adatintenzív megközelítés a tudományokban

Adatintenzív megközelítés a tudományokban Data-intensive approach in sciences
ISTVAN CSABAI DEPARTMENT OF PHYSICS OF COMPLEX SYSTEMS ELTE EÖTVÖS LORÁND UNIVERSITY, BUDAPEST
Acknowledgement: Ministry of Innovation and Technology NRDI Office, MILAB Artificial Intelligence National Laboratory Program, FIEK_16-1-2016-0005, 2020-4.1.1.-TKP2020, NVKP_16-1-2016-0004, H2020 VEO No. 874735.
NETWORKSHOP 2021.04.07
History of (machine) intelligence / data science
World
Model
History of (machine) intelligence / data science
World
Model
History of (machine) intelligence / data science
Model
Instruments
World
Natural intelligence
Homo Sapiens: Technical Specifications
CPU 100 GN (giga-neurons)
7±2 bit
Pollack, I. The information of elementary auditory displays. J. Acoust. Soc. Amer., 1952, 24, 745-749.
Clock frequency 4-32 Hz
CPU cores 1 (male version), 2+ (female v.)
CPU speed 0.1 Flops (floating point op. / sec)
Memory (short term) 7 +/-2 bits
Storage 1TB-2.5PB
Power 20 W
Camera 576Mpix, 24Hz
Touch Yes
Display No
Speakers Mono
GPS No
WIFI No
Bluetooth No
2G/3G/4G/5G No/No/No/No
Latest version update
100 000 BC
Main Features :
• Find food
• Escape predators
• Kill enemies
• Find mate and reproduce
History of (machine) intelligence / data science
First "Data Science"
Tabulae Rudolphinae (1627), 23 years,
History of (machine) intelligence / data science
Science - technology - science - technology...
Prototype of modern "data science"
SLOAN DIGITAL SKY SURVEY:
2.5 terapixel image - 300 million 640 fibers - galaxies - 5 optical bands 1 million spectra
2.5 terapixel image - 300 million 640 fibers-
2.5m 120Mp –> 2.5Tp 5 years:10TB
New issue: BIG DATA !!!
CfA 1989: 1100 galaxies
Huge data tables


Scientific goals and researcher’s perspective
Queries in data space: e.g. separate stars and galaxies
petroMag_i > 17.5 and (petroMag_r > 15.5 or petroR50_r > 2) and(petroMag_r > 0 andg > 0 and r > 0 and i> 0) and ( (petroMag_r extinction_r) < 19.2 and (petroMag_r extinction_r < (13.1 + (7/3) * (dered_g dered_r) + 4 * (dered_r
dered_i) 4 * 0.18) ) and ( (dered_r dered_i (dered_g dered_r)/4 0.18) < 0.2) and ( (dered_r dered_i (dered_g dered_r)/4 0.18) > 0.2) and ( (petroMag_r extinction_r + 2.5
* LOG10(2 * 3.1415 * petroR50_r * petroR50_r)) < 24.2) ) or ( (petroMag_r extinction_r < 19.5) and ( (dered_r dered_i (dered_g dered_r)/4 0.18) > (0.45 4 * (dered_g dered_r)) ) and ( (dered_g dered_r) > (1.35 +
0.25 * (dered_r dered_i)) ) ) and ( (petroMag_r extinction_r +
2.5 * LOG10(2 * 3.1415 * petroR50_r * petroR50_r) ) < 23.3 ) )
New skills: Indexing, databases
• SDSS data "read through"~1 day
• Astronomers should learn: Database programming, computer geometry, search trees,...
• Multidimensional-and spherical indexing
Modern data science: same trends in biology, environmental sciences, social sciences, ...
Not only astronomy: genomics
Sanger-sequencing First virus sequence 1977: .X174, 5386nt
Nyitray Lászl, Pál Gábor: A biokémia és molekuláris biolgia alapjai (2013)
30 years later: NGS, nanopore
Moore's law in genomics
Sequencing is getting cheaper. More (public) data available.
(HGP) 1990–2003 2020 2030? 13 years / 2,7 billion USD Few days / loss function optimization
images -> points in N dim space
Loss = number of wrong categorizations (error)
Complex systems – complex models
To understand complex systems we need complex models
Complex models, 2M+ parameters!
We need
• Huge amount of data to set up, constrain, parametrize the models
• Powerful computers and clever algorithms
Complex function regression: machine learning!
AI Research, Education and Applications @ Ev University
Dept. of Physics of Complex Systems
• Genetics -> antibiotics resistance
Matamoros et al., Pataki et al. 2020.
• Mobile sensors -> Parkinson
Pataki @DREAM, Laki et al. 2016
• Mosquito images -> vector borne diseases
Pataki et al. Sci.Rep. 2021
• Medical imaging -> breast cancer
Ribli et al. @DREAM, Sci. Rep. 2018
• Weak lensing map -> cosmology parameters
Ribli et al. Nature Astro. 2018, MNRAS 2019
• Explainable AI
Ribli et al. in prep.
• Control of aging related methylation networks
Palla et al. subm.
• Pathology images
SOTE TKP collab.
• Quantum ML
• MSc, PhD courses
Vector borne diseases: MosquitoAlert image deep learning
"Zika, dengue, chikungunya, and yellow fever are all transmitted to humans by Ae. aegypti and Ae. Albopictus."
F. Bartumeus et al. http://www.mosquitoalert.com/
False(?) negatives:
False(?) positives:
Pataki et al. Sci. Rep. 2021.
Space weather : whistler detection Language of the genome
Pollen monitoring
Animal health
Deep learning for colorectal cancer pathology
Mammography with deep learning (Faster R-CNN )
• Digital Mammography DREAM challenge • 1200 participants
• Dezső Ribli, best final result
• the only solution with localization
• AUC = 0.95
• Publication: Nature Scientific Reports (2018)
• 30-th most popular from 17000 articles
• New collaborations with hospitals, clinics
• more training data • open source plugin
• steps towards licensing
D. Ribli, A. Horváth, Z. Unger, P. Pollner, and I. Csabai. "Detecting and classifying lesions in mammograms with deep learning." Scientific reports (2018)
Explainable AI: automatic classification enhancement
Any sufficiently advanced technology is indistinguishable from magic. /Arthur C. Clarke/
Indeed, understanding the laws of mechanics made us able to build pyramids and cathedrals, based on the laws of thermodynamics the invention of the steam engine empowered us to cross oceans and continents and today we all have "seven-league boots" in our garages. Understanding electrodynamics and quantum mechanics brought us the transistor that is at the heart of the Internet and the modern "magic mirrors", the mobile phones. With the advancements of high throughput techniques we may be ready to tackle another frontier: life and intelligence at last, because it is the most sophisticated and complex. End of diseases, much longer healthy life,...?
What miracles will the advancements of machine learning bring? And what kind of challenges?
NEW PARADIGMS NEED NEW RESEARCHERS
EDUCATION: We need new scientist who have professional skills both in their
István Csabai
ELTE Dept. of Physics of Complex Systems csabai@elte.hu http://complex.elte.hu/~csabai/ angol

#tudomány#adat#adatfeldolgozás#információáramlás#adatbázis
Forrás
Videotorium
Kapcsolódó
Megtekintés a DKA oldalán