new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Nov 21

V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning

Human processes video reasoning in a sequential spatio-temporal reasoning logic, we first identify the relevant frames ("when") and then analyse the spatial relationships ("where") between key objects, and finally leverage these relationships to draw inferences ("what"). However, can Video Large Language Models (Video-LLMs) also "reason through a sequential spatio-temporal logic" in videos? Existing Video-LLM benchmarks primarily focus on assessing object presence, neglecting relational reasoning. Consequently, it is difficult to measure whether a model truly comprehends object interactions (actions/events) in videos or merely relies on pre-trained "memory" of co-occurrences as biases in generating answers. In this work, we introduce a Video Spatio-Temporal Reasoning (V-STaR) benchmark to address these shortcomings. The key idea is to decompose video understanding into a Reverse Spatio-Temporal Reasoning (RSTR) task that simultaneously evaluates what objects are present, when events occur, and where they are located while capturing the underlying Chain-of-thought (CoT) logic. To support this evaluation, we construct a dataset to elicit the spatial-temporal reasoning process of Video-LLMs. It contains coarse-to-fine CoT questions generated by a semi-automated GPT-4-powered pipeline, embedding explicit reasoning chains to mimic human cognition. Experiments from 14 Video-LLMs on our V-STaR reveal significant gaps between current Video-LLMs and the needs for robust and consistent spatio-temporal reasoning.

  • 6 authors
·
Mar 14 2

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Most video reasoning models only generate textual reasoning traces without indicating when and where key evidence appears. Recent models such as OpenAI-o3 have sparked wide interest in evidence-centered reasoning for images, yet extending this ability to videos is more challenging, as it requires joint temporal tracking and spatial localization across dynamic scenes. We introduce Open-o3 Video, a non-agent framework that integrates explicit spatio-temporal evidence into video reasoning, and carefully collect training data and design training strategies to address the aforementioned challenges. The model highlights key timestamps, objects, and bounding boxes alongside its answers, allowing reasoning to be grounded in concrete visual observations. To enable this functionality, we first curate and build two high-quality datasets, STGR-CoT-30k for SFT and STGR-RL-36k for RL, with carefully constructed temporal and spatial annotations, since most existing datasets offer either temporal spans for videos or spatial boxes on images, lacking unified spatio-temporal supervision and reasoning traces. Then, we adopt a cold-start reinforcement learning strategy with multiple specially designed rewards that jointly encourage answer accuracy, temporal alignment, and spatial precision. On V-STAR benchmark, Open-o3 Video achieves state-of-the-art performance, raising mAM by 14.4% and mLGM by 24.2% on the Qwen2.5-VL baseline. Consistent improvements are also observed on a broad range of video understanding benchmarks, including VideoMME, WorldSense, VideoMMMU, and TVGBench. Beyond accuracy, the reasoning traces produced by Open-o3 Video also provide valuable signals for test-time scaling, enabling confidence-aware verification and improving answer reliability.

ByteDance ByteDance
·
Oct 23 3

Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception

Multimodal Large Language Models (MLLMs) require high-resolution visual information to perform fine-grained perception, yet processing entire high-resolution images is computationally prohibitive. While recent methods leverage a Region-of-Interest (RoI) mechanism to focus on salient areas, they typically present a difficult trade-off: training-based approaches depend on large-scale annotated datasets, while training-free methods that utilize the model's internal attention are computationally inefficient and less accurate, requiring either multi-pass prefill stages or reliance on the slow auto-regressive decoding process. In this paper, we propose an efficient, annotation-free Self-Distilled Region Proposal Network (SD-RPN) that resolves this trade-off. The SD-RPN is built around a pipeline that transforms the noisy attention maps from the MLLM's middle layers into high-quality pseudo-RoI labels by explicitly denoising the signal and resolving ambiguity. We use these labels to train a lightweight Region Proposal Network (RPN) that learns a more precise localization. This RPN is also highly efficient, predicting the RoI in a single forward pass using features from the MLLM's middle layers, decoupling RoI identification from the auto-regressive generation and avoiding costly multi-pass operations.To validate our approach, we integrate the framework into the LLaVA-1.5 architecture. Despite being trained on only a few (e.g. 10K) question-answer pairs, our method demonstrates exceptional data efficiency and generalization, achieving over a 10% absolute accuracy improvement on unseen benchmarks, including TextVQA, DocVQA, and V-Star. Our work presents a practical and scalable solution for enhancing the fine-grained perception of MLLMs without requiring costly supervision or full model fine-tuning. Code is available at https://github.com/YuHengsss/SD-RPN.

Inflationary Attractors Predictions for Static Neutron Stars in the Mass-Gap Region

In this work we study static neutron stars in the context of several inflationary models which are popular in cosmology. These inflationary models are non-minimally coupled scalar theories which yield a viable inflationary phenomenology in both Jordan and Einstein frames. By considering the constraints from inflationary theories, which basically determine the values of the potential strength, usually considered as a free parameter in astrophysical neutron star works, we construct and solve the Tolman-Oppenheimer-Volkoff equations using a solid python-3 LSODA integrator. For our study we consider several popular inflationary models, such as the universal attractors, the R^p attractors (three distinct model values), the induced inflation, the quadratic inflation, the Higgs inflation and the a-attractors (two distinct model values) and for the following popular equations of state the WFF1, the SLy, the APR, the MS1, the AP3, the AP4, the ENG, the MPA1 and the MS1b. We construct the M-R diagram and we confront the resulting theory with theoretical and observational constraints. As we demonstrate, remarkably, all the neutron stars produced by all the inflationary models we considered are compatible with all the constraints for the MPA1 equation of state. It is notable that for this particular equation of state, the maximum masses of the neutron stars are in the mass-gap region with M>2.5M_{odot}, but lower than the 3 solar masses causal limit. We also make the observation that as the NICER constraints are pushed towards larger radii, as for example in the case of the black widow pulsar PSR J0952-0607, it seems that equations of state that produce neutron stars with maximum masses in the mass gap region, with M>2.5M_{odot}, but lower than the 3 solar masses causal limit, are favored and are compatible with the modified NICER constraints.

  • 2 authors
·
May 9, 2023

Flashlights: An Off-Caustic Lensed Star at Redshift $z$ = 1.26 in Abell 370

We report the discovery of a transient seen in a strongly lensed arc at redshift z_{rm s}=1.2567 in Hubble Space Telescope imaging of the Abell 370 galaxy cluster. The transient is detected at 29.51pm0.14 AB mag in a WFC3/UVIS F200LP difference image made using observations from two different epochs, obtained in the framework of the Flashlights program, and is also visible in the F350LP band (m_{rm F350LP} approx 30.53pm0.76 AB mag). The transient is observed on the negative-parity side of the critical curve at a distance of sim 0.6" from it, greater than previous examples of lensed stars. The large distance from the critical curve yields a significantly smaller macromagnification, but our simulations show that bright, O/B-type supergiants can reach sufficiently high magnifications to be seen at the observed position and magnitude. In addition, the observed transient image is a trailing image with an observer-frame time delay of sim+0.8 days from its expected counterpart, so that any transient lasting for longer than that should have also been seen on the minima side and is thus excluded. This, together with the blue colour we measure for the transient (m_{rm F200LP} - m_{rm F350LP} approx [-0.3,-1.6] AB), rules out most other transient candidates such as (kilo)novae, for example, and makes a lensed star the prime candidate. Assuming the transient is indeed a lensed star as suggested, many more such events should be detected in the near future in cluster surveys with the Hubble Space Telescope and James Webb Space Telescope.

  • 13 authors
·
Nov 2, 2022

Radii, masses, and transit-timing variations of the three-planet system orbiting the naked-eye star TOI-396

TOI-396 is an F6V star (Vapprox6.4) orbited by three transiting planets. The orbital periods of the two innermost planets are close to the 5:3 commensurability (P_b sim3.6 d and P_c sim6.0 d). To measure the masses of the three planets, refine their radii, and investigate whether planets b and c are in MMR, we carried out HARPS RV observations and retrieved photometric data from TESS. We extracted the RVs via a skew-normal fit onto the HARPS CCFs and performed an MCMC joint analysis of the Doppler measurements and transit photometry, while employing the breakpoint method to remove stellar activity from the RV time series. We also performed a thorough TTV dynamical analysis of the system. Our analysis confirms that the three planets have similar sizes: R_b=2.004_{-0.047}^{+0.045}R_{oplus}; R_c=1.979_{-0.051}^{+0.054}R_{oplus}; R_d=2.001_{-0.064}^{+0.063}R_{oplus}. For the first time, we have determined the RV masses for TOI-396b and d: M_b=3.55_{-0.96}^{+0.94}M_{oplus} (rho_b=2.44_{-0.68}^{+0.69} g cm^{-3}) and M_d=7.1pm1.6M_{oplus} (rho_d=4.9_{-1.1}^{+1.2} g cm^{-3}). Our results suggest a quite unusual system architecture, with the outermost planet being the densest. The Doppler reflex motion induced by TOI-396c remains undetected in our RV time series, likely due to the proximity of P_c to the star's rotation period (P_{rot}=6.7pm1.3 d). We also discovered that TOI-396b and c display significant TTVs. While the TTV dynamical analysis returns a formally precise mass for TOI-396c (M_{c,dyn}=2.24^{+0.13}_{-0.67}M_{oplus}), the result might not be accurate owing to the poor sampling of the TTV phase. We also conclude that TOI-396b and c are close to but out of the 5:3 MMR. Our numerical simulation suggests TTV semi-amplitudes of up to 5 hours over a temporal baseline of sim5.2 years.

  • 41 authors
·
Nov 22, 2024

EPOCHS Paper V. The dependence of galaxy formation on galaxy structure at z < 7 from JWST observations

We measure the broad impact of galaxy structure on galaxy formation by examining the ongoing star formation and integrated star formation history as revealed through the stellar masses of galaxies at z < 7 based on JWST CEERS data from the Extended Groth Strip (EGS). Using the morphological catalog of 3965 visually classified JWST galaxies from Ferreira et al. (2023), we investigate the evolution of stars, and when they form, as a function of morphological type as well as galaxies classified as passive and starburst through spectral energy distributions. Although disk galaxies dominate the structures of galaxies at z < 7, we find that these disks are in general either `passive', or on the main-sequence of star formation, and do not contain a large population of starburst galaxies. We also find no significant correlation between morphological type and the star formation rate or colours of galaxies at z < 7. In fact, we find that the morphologically classified `spheroids' tend to be blue and are not found to be predominately passive systems at z > 1.5. We also find that the stellar mass function for disk galaxies does not evolve significantly during this time, whereas other galaxy types, such as the peculiar population, evolve dramatically, declining at lower redshifts. This indicates that massive peculiars are more common at higher redshifts. We further find that up to z sim 7, the specific star formation rate (sSFR) does not vary with visual morphology, but strongly depends on stellar mass and internal galaxy mass density. This demonstrates that at early epochs galaxy assembly is a mass-driven, rather than a morphologically-driven, process. Quenching of star formation is therefore a mass-dominated process throughout the universe's history, likely due to the presence of supermassive black holes.

  • 14 authors
·
May 1, 2024

Synthesis of Sound and Precise Leakage Contracts for Open-Source RISC-V Processors

Leakage contracts have been proposed as a new security abstraction at the instruction set architecture level. Leakage contracts aim to capture the information that processors may leak via microarchitectural side channels. Recently, the first tools have emerged to verify whether a processor satisfies a given contract. However, coming up with a contract that is both sound and precise for a given processor is challenging, time-consuming, and error-prone, as it requires in-depth knowledge of the timing side channels introduced by microarchitectural optimizations. In this paper, we address this challenge by proposing LeaSyn, the first tool for automatically synthesizing leakage contracts that are both sound and precise for processor designs at register-transfer level. Starting from a user-provided contract template that captures the space of possible contracts, LeaSyn automatically constructs a contract, alternating between contract synthesis, which ensures precision based on an empirical characterization of the processor's leaks, and contract verification, which ensures soundness. Using LeaSyn, we automatically synthesize contracts for six open-source RISC-V CPUs for a variety of contract templates. Our experiments indicate that LeaSyn's contracts are sound and more precise (i.e., represent the actual leaks in the target processor more faithfully) than contracts constructed by existing approaches.

  • 5 authors
·
Sep 8

Understanding the Neutron Star Population with the SKA

Since their discovery in the late 1960's the population of known neutron stars (NSs) has grown to ~2500. The last five decades of observations have yielded many surprises and demonstrated that the observational properties of NSs are remarkably diverse. The surveys that will be performed with SKA (the Square Kilometre Array) will produce a further tenfold increase in the number of Galactic NSs known. Moreover, the SKA's broad spectral coverage, sub-arraying and multi-beaming capabilities will allow us to characterise these sources with unprecedented efficiency, in turn enabling a giant leap in the understanding of their properties. Here we review the NS population and outline our strategies for studying each of the growing number of diverse classes that are populating the "NS zoo". Some of the main scientific questions that will be addressed by the much larger statistical samples and vastly improved timing efficiency provided by SKA include: (i) the spin period and spin-down rate distributions (and thus magnetic fields) at birth, and the associated information about the SNe wherein they are formed; (ii) the radio pulsar-magnetar connection; (iii) the link between normal radio pulsars, intermittent pulsars and rotating radio transients; (iv) the slowest possible spin period for a radio pulsar (revealing the conditions at the pulsar death-line); (v) proper motions of pulsars (revealing SN kick physics); (vi) the mass distribution of NSs (vii) the fastest possible spin period for a recycled pulsar (constraining magnetosphere-accretion disc interactions, gravitational wave radiation and the equation-of-state); (viii) the origin of high eccentricity millisecond pulsars (MSPs); (ix) the formation channels for recently identified triple systems; and finally (x) how isolated MSPs are formed. We expect that the SKA will break new ground unveiling exotic systems that will challenge... [abridged]

  • 12 authors
·
Dec 30, 2014

Probing the shape of the Milky Way dark matter halo with hypervelocity stars: a new method

We propose a new method to determine the shape of the gravitational potential of the dark matter (DM) halo of the Milky Way (MW) with the galactocentric tangential velocities of a sample of hypervelocity stars (HVSs). We compute the trajectories of different samples of HVSs in a MW where the baryon distribution is axisymmetric and the DM potential either is spherical or is spheroidal or triaxial with radial-dependent axis ratios. We determine the shape of the DM potential with the distribution of the latitudinal velocity |v_{vartheta}| in axisymmetric Galactic potentials, or with the distribution of |v_{vartheta}| and of a function bar v_{varphi} of the azimuthal velocity in non-axisymmetric Galactic potentials. We recover the correct shape of the DM potential by comparing the distribution of |v_{vartheta}| and bar v_{varphi} against the corresponding distributions of mock samples of HVSs that traveled in DM halos of different shapes. We use the largest possible sample of sim 800 HVSs of 4~M_odot ejected with the Hills mechanism at a rate sim 10^{-4} yr^{-1}, currently outgoing, and located at more than 10 kpc from the Galactic center. In our ideal case of galactocentric velocities with null uncertainties and no observational limitations, our method recovers the correct shape of the DM potential with a success rate Sgtrsim 89% in axisymmetric Galactic potentials, and S > 96% in the explored non-axisymmetric cases. The unsuccessful cases yield axis ratios of the DM potential that are off by pm 0.1. The success rate decreases with decreasing sample size: for example, for a spherical DM halo, S drops from sim 98% to sim 38% when the sample size decreases from sim 800 to sim 40 HVSs. A robust determination of the shape of the DM potential thus requires the measure of the galactocentric velocity of a few hundred genuine HVSs.

  • 5 authors
·
Nov 18, 2021

Sloan Digital Sky Survey IV: Mapping the Milky Way, Nearby Galaxies, and the Distant Universe

We describe the Sloan Digital Sky Survey IV (SDSS-IV), a project encompassing three major spectroscopic programs. The Apache Point Observatory Galactic Evolution Experiment 2 (APOGEE-2) is observing hundreds of thousands of Milky Way stars at high resolution and high signal-to-noise ratio in the near-infrared. The Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey is obtaining spatially-resolved spectroscopy for thousands of nearby galaxies (median redshift of z = 0.03). The extended Baryon Oscillation Spectroscopic Survey (eBOSS) is mapping the galaxy, quasar, and neutral gas distributions between redshifts z = 0.6 and 3.5 to constrain cosmology using baryon acoustic oscillations, redshift space distortions, and the shape of the power spectrum. Within eBOSS, we are conducting two major subprograms: the SPectroscopic IDentification of eROSITA Sources (SPIDERS), investigating X-ray AGN and galaxies in X-ray clusters, and the Time Domain Spectroscopic Survey (TDSS), obtaining spectra of variable sources. All programs use the 2.5-meter Sloan Foundation Telescope at Apache Point Observatory; observations there began in Summer 2014. APOGEE-2 also operates a second near-infrared spectrograph at the 2.5-meter du Pont Telescope at Las Campanas Observatory, with observations beginning in early 2017. Observations at both facilities are scheduled to continue through 2020. In keeping with previous SDSS policy, SDSS-IV provides regularly scheduled public data releases; the first one, Data Release 13, was made available in July 2016.

  • 353 authors
·
Feb 28, 2017

Overview of the JWST Advanced Deep Extragalactic Survey (JADES)

We present an overview of the James Webb Space Telescope (JWST) Advanced Deep Extragalactic Survey (JADES), an ambitious program of infrared imaging and spectroscopy in the GOODS-S and GOODS-N deep fields, designed to study galaxy evolution from high redshift to cosmic noon. JADES uses about 770 hours of Cycle 1 guaranteed time largely from the Near-Infrared Camera (NIRCam) and Near-Infrared Spectrograph (NIRSpec) instrument teams. In GOODS-S, in and around the Hubble Ultra Deep Field and Chandra Deep Field South, JADES produces a deep imaging region of ~45 arcmin^2 with an average of 130 hrs of exposure time spread over 9 NIRCam filters. This is extended at medium depth in GOODS-S and GOODS-N with NIRCam imaging of ~175 arcmin^2 with an average exposure time of 20 hrs spread over 8-10 filters. In both fields, we conduct extensive NIRSpec multi-object spectroscopy, including 2 deep pointings of 55 hrs exposure time, 14 medium pointings of ~12 hrs, and 15 shallower pointings of ~4 hrs, targeting over 5000 HST and JWST-detected faint sources with 5 low, medium, and high-resolution dispersers covering 0.6-5.3 microns. Finally, JADES extends redward via coordinated parallels with the JWST Mid-Infrared Instrument (MIRI), featuring ~9 arcmin^2 with 43 hours of exposure at 7.7 microns and twice that area with 2-6.5 hours of exposure at 12.8 microns For nearly 30 years, the GOODS-S and GOODS-N fields have been developed as the premier deep fields on the sky; JADES is now providing a compelling start on the JWST legacy in these fields.

  • 76 authors
·
Jun 4, 2023

Euclid. II. The VIS Instrument

This paper presents the specification, design, and development of the Visible Camera (VIS) on the ESA Euclid mission. VIS is a large optical-band imager with a field of view of 0.54 deg^2 sampled at 0.1" with an array of 609 Megapixels and spatial resolution of 0.18". It will be used to survey approximately 14,000 deg^2 of extragalactic sky to measure the distortion of galaxies in the redshift range z=0.1-1.5 resulting from weak gravitational lensing, one of the two principal cosmology probes of Euclid. With photometric redshifts, the distribution of dark matter can be mapped in three dimensions, and, from how this has changed with look-back time, the nature of dark energy and theories of gravity can be constrained. The entire VIS focal plane will be transmitted to provide the largest images of the Universe from space to date, reaching m_AB>24.5 with S/N >10 in a single broad I_E~(r+i+z) band over a six year survey. The particularly challenging aspects of the instrument are the control and calibration of observational biases, which lead to stringent performance requirements and calibration regimes. With its combination of spatial resolution, calibration knowledge, depth, and area covering most of the extra-Galactic sky, VIS will also provide a legacy data set for many other fields. This paper discusses the rationale behind the VIS concept and describes the instrument design and development before reporting the pre-launch performance derived from ground calibrations and brief results from the in-orbit commissioning. VIS should reach fainter than m_AB=25 with S/N>10 for galaxies of full-width half-maximum of 0.3" in a 1.3" diameter aperture over the Wide Survey, and m_AB>26.4 for a Deep Survey that will cover more than 50 deg^2. The paper also describes how VIS works with the other Euclid components of survey, telescope, and science data processing to extract the cosmological information.

  • 435 authors
·
May 22, 2024

Is planetary inward migration responsible for GJ 504's fast rotation and bright X-ray luminosity? New constraints from eROSITA

The discovery of an increasing variety of exoplanets in very close orbits around their host stars raised many questions about how stars and planets interact, and to which extent host stars' properties may be influenced by the presence of close-by companions. Understanding how the evolution of stars is impacted by the interactions with their planets is fundamental to disentangle their intrinsic evolution from Star-Planet Interactions (SPI)-induced phenomena. GJ 504 is a promising candidate for a star that underwent strong SPI. Its unusually short rotational period (3.4 days), while being in contrast with what is expected by single-star models, could result from the inward migration of a close-by, massive companion, pushed starward by tides. Moreover, its brighter X-ray luminosity may hint at a rejuvenation of the dynamo process sustaining the stellar magnetic field, consequent to the SPI-induced spin-up. We aim to study the evolution of GJ 504 and establish whether by invoking the engulfment of a planetary companion we can better reproduce its rotational period and X-ray luminosity. We simulate the past evolution assuming two different scenarios: 'Star without close-by planet', 'Star with close-by planet'. In the second scenario, we investigate how inward migration and planetary engulfment driven by tides spin up the stellar surface and rejuvenate its dynamo. We compare our tracks with rotational period and X-ray data collected from the all-sky surveys of the ROentgen Survey with an Imaging Telescope Array (eROSITA) on board the Russian Spektrum-Roentgen-Gamma mission (SRG). Despite the very uncertain stellar age, we found that the second evolutionary scenario is in better agreement with the short rotational period and the bright X-ray luminosity of GJ 504, thus strongly favouring the inward migration scenario over the one in which close-by planets have no tidal impact on the star.

  • 7 authors
·
Jan 13

Searching for MobileNetV3

We present the next generation of MobileNets based on a combination of complementary search techniques as well as a novel architecture design. MobileNetV3 is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm and then subsequently improved through novel architecture advances. This paper starts the exploration of how automated search algorithms and network design can work together to harness complementary approaches improving the overall state of the art. Through this process we create two new MobileNet models for release: MobileNetV3-Large and MobileNetV3-Small which are targeted for high and low resource use cases. These models are then adapted and applied to the tasks of object detection and semantic segmentation. For the task of semantic segmentation (or any dense pixel prediction), we propose a new efficient segmentation decoder Lite Reduced Atrous Spatial Pyramid Pooling (LR-ASPP). We achieve new state of the art results for mobile classification, detection and segmentation. MobileNetV3-Large is 3.2\% more accurate on ImageNet classification while reducing latency by 15\% compared to MobileNetV2. MobileNetV3-Small is 4.6\% more accurate while reducing latency by 5\% compared to MobileNetV2. MobileNetV3-Large detection is 25\% faster at roughly the same accuracy as MobileNetV2 on COCO detection. MobileNetV3-Large LR-ASPP is 30\% faster than MobileNetV2 R-ASPP at similar accuracy for Cityscapes segmentation.

  • 12 authors
·
May 6, 2019

Towards Robust Mathematical Reasoning

Finding the right north-star metrics is highly critical for advancing the mathematical reasoning capabilities of foundation models, especially given that existing evaluations are either too easy or only focus on getting correct short answers. To address these issues, we present IMO-Bench, a suite of advanced reasoning benchmarks, vetted by a panel of top specialists and that specifically targets the level of the International Mathematical Olympiad (IMO), the most prestigious venue for young mathematicians. IMO-AnswerBench first tests models on 400 diverse Olympiad problems with verifiable short answers. IMO-Proof Bench is the next-level evaluation for proof-writing capabilities, which includes both basic and advanced IMO level problems as well as detailed grading guidelines to facilitate automatic grading. These benchmarks played a crucial role in our historic achievement of the gold-level performance at IMO 2025 with Gemini Deep Think (Luong and Lockhart, 2025). Our model achieved 80.0% on IMO-AnswerBench and 65.7% on the advanced IMO-Proof Bench, surpassing the best non-Gemini models by large margins of 6.9% and 42.4% respectively. We also showed that autograders built with Gemini reasoning correlate well with human evaluations and construct IMO-GradingBench, with 1000 human gradings on proofs, to enable further progress in automatic evaluation of long-form answers. We hope that IMO-Bench will help the community towards advancing robust mathematical reasoning and release it at https://imobench.github.io/.

Accelerating the Search for Superconductors Using Machine Learning

Prediction of critical temperature (T_c) of a superconductor remains a significant challenge in condensed matter physics. While the BCS theory explains superconductivity in conventional superconductors, there is no framework to predict T_c of unconventional, higher T_{c} superconductors. Quantum Structure Diagrams (QSD) were successful in establishing structure-property relationship for superconductors, quasicrystals, and ferroelectric materials starting from chemical composition. Building on the QSD ideas, we demonstrate that the principal component analysis of superconductivity data uncovers the clustering of various classes of superconductors. We use machine learning analysis and cleaned databases of superconductors to develop predictive models of T_c of a superconductor using its chemical composition. Earlier studies relied on datasets with inconsistencies, leading to suboptimal predictions. To address this, we introduce a data-cleaning workflow to enhance the statistical quality of superconducting databases by eliminating redundancies and resolving inconsistencies. With this improvised database, we apply a supervised machine learning framework and develop a Random Forest model to predict superconductivity and T_c as a function of descriptors motivated from Quantum Structure Diagrams. We demonstrate that this model generalizes effectively in reasonably accurate prediction of T_{c} of compounds outside the database. We further employ our model to systematically screen materials across materials databases as well as various chemically plausible combinations of elements and predict Tl_{5}Ba_{6}Ca_{6}Cu_{9}O_{29} to exhibit superconductivity with a T_{c} sim 105 K. Being based on the descriptors used in QSD's, our model bypasses structural information and predicts T_{c} merely from the chemical composition.

  • 2 authors
·
May 17

A JWST Project on 47 Tucanae: Kinematics, energy equipartition and anisotropy of multiple populations

Recent work with JWST has demonstrated its capability to identify and chemically characterize multiple populations in globular clusters down to the H-burning limit. In this study, we explore the kinematics of multiple populations in the globular cluster 47 Tucanae by combining data from JWST, HST, and Gaia. We analyzed velocity dispersion and anisotropy profiles from the cluster center out to sim10R_h. Our findings indicate that while 1G stars are isotropic, 2G stars are significantly radially anisotropic. These results align with the predictions of simulations of the dynamical evolution of clusters where 2G stars are initially more centrally concentrated than 1G stars. Furthermore, we subdivided the 2G population into two subpopulations: 2G_A and 2G_B, with the latter being more chemically extreme. We compared their dynamical profiles and found no significant differences. For the first time, we measured the degree of energy equipartition among the multiple populations of 47 Tucanae. Overall, within the analyzed radial range (sim2-4R_h), both populations exhibit a low degree of energy equipartition. The most significant differences between 1G and 2G stars are observed in the tangential velocity component, where 2G stars are characterized by a stronger degree of energy equipartition than 1G stars. In the radial component, the behavior of 1G and 2G stars is more variable, with differences largely dependent on radius. Finally, our analysis reveals that the ratio of rotational velocity to velocity dispersion is larger for the 2G population, while 1G stars exhibit higher skewness in their tangential proper motions, providing further evidence of differences in the kinematic properties of the 1G and 2G populations.

  • 19 authors
·
Feb 5

RUBIES: a complete census of the bright and red distant Universe with JWST/NIRSpec

We present the Red Unknowns: Bright Infrared Extragalactic Survey (RUBIES), providing JWST/NIRSpec spectroscopy of red sources selected across ~150 arcmin^2 from public JWST/NIRCam imaging in the UDS and EGS fields. RUBIES novel observing strategy offers a well-quantified selection function: the survey is optimised to reach high (>70%) completeness for bright and red (F150W-F444W>2) sources that are very rare. To place these rare sources in context, we simultaneously observe a reference sample of the 2<z<7 galaxy population, sampling sources at a rate that is inversely proportional to their number density in the 3D space of F444W magnitude, F150W-F444W colour, and photometric redshift. In total, RUBIES observes ~3000 targets across 1<z_{phot}<10 with both the PRISM and G395M dispersers, and ~1500 targets at z_{phot}>3 using only the G395M disperser. The RUBIES data reveal a highly diverse population of red sources that span a broad redshift range (z_{spec}sim1-9), with photometric redshift scatter and outlier fraction that are 3 times higher than for similarly bright sources that are less red. This diversity is not apparent from the photometric SEDs. Only spectroscopy reveals that the SEDs encompass a mixture of galaxies with dust-obscured star formation, extreme line emission, a lack of star formation indicating early quenching, and luminous active galactic nuclei. As a first demonstration of our broader selection function we compare the stellar masses and rest-frame U-V colours of the red sources and our reference sample: red sources are typically more massive (M_*sim10^{10-11.5} M_odot) across all redshifts. However, we find that the most massive systems span a wide range in U-V colour. We describe our data reduction procedure and data quality, and publicly release the reduced RUBIES data and vetted spectroscopic redshifts of the first half of the survey through the DJA.

  • 28 authors
·
Sep 9, 2024

IndicSTR12: A Dataset for Indic Scene Text Recognition

The importance of Scene Text Recognition (STR) in today's increasingly digital world cannot be overstated. Given the significance of STR, data intensive deep learning approaches that auto-learn feature mappings have primarily driven the development of STR solutions. Several benchmark datasets and substantial work on deep learning models are available for Latin languages to meet this need. On more complex, syntactically and semantically, Indian languages spoken and read by 1.3 billion people, there is less work and datasets available. This paper aims to address the Indian space's lack of a comprehensive dataset by proposing the largest and most comprehensive real dataset - IndicSTR12 - and benchmarking STR performance on 12 major Indian languages. A few works have addressed the same issue, but to the best of our knowledge, they focused on a small number of Indian languages. The size and complexity of the proposed dataset are comparable to those of existing Latin contemporaries, while its multilingualism will catalyse the development of robust text detection and recognition models. It was created specifically for a group of related languages with different scripts. The dataset contains over 27000 word-images gathered from various natural scenes, with over 1000 word-images for each language. Unlike previous datasets, the images cover a broader range of realistic conditions, including blur, illumination changes, occlusion, non-iconic texts, low resolution, perspective text etc. Along with the new dataset, we provide a high-performing baseline on three models - PARSeq, CRNN, and STARNet.

  • 3 authors
·
Mar 12, 2024

Adding Gradient Noise Improves Learning for Very Deep Networks

Deep feedforward and recurrent networks have achieved impressive results in many perception and language processing applications. This success is partially attributed to architectural innovations such as convolutional and long short-term memory networks. The main motivation for these architectural innovations is that they capture better domain knowledge, and importantly are easier to optimize than more basic architectures. Recently, more complex architectures such as Neural Turing Machines and Memory Networks have been proposed for tasks including question answering and general computation, creating a new set of optimization challenges. In this paper, we discuss a low-overhead and easy-to-implement technique of adding gradient noise which we find to be surprisingly effective when training these very deep architectures. The technique not only helps to avoid overfitting, but also can result in lower training loss. This method alone allows a fully-connected 20-layer deep network to be trained with standard gradient descent, even starting from a poor initialization. We see consistent improvements for many complex models, including a 72% relative reduction in error rate over a carefully-tuned baseline on a challenging question-answering task, and a doubling of the number of accurate binary multiplication models learned across 7,000 random restarts. We encourage further application of this technique to additional complex modern architectures.

  • 7 authors
·
Nov 20, 2015

Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis

We present a simple approach to make pre-trained Vision Transformers (ViTs) interpretable for fine-grained analysis, aiming to identify and localize the traits that distinguish visually similar categories, such as bird species. Pre-trained ViTs, such as DINO, have demonstrated remarkable capabilities in extracting localized, discriminative features. However, saliency maps like Grad-CAM often fail to identify these traits, producing blurred, coarse heatmaps that highlight entire objects instead. We propose a novel approach, Prompt Class Attention Map (Prompt-CAM), to address this limitation. Prompt-CAM learns class-specific prompts for a pre-trained ViT and uses the corresponding outputs for classification. To correctly classify an image, the true-class prompt must attend to unique image patches not present in other classes' images (i.e., traits). As a result, the true class's multi-head attention maps reveal traits and their locations. Implementation-wise, Prompt-CAM is almost a ``free lunch,'' requiring only a modification to the prediction head of Visual Prompt Tuning (VPT). This makes Prompt-CAM easy to train and apply, in stark contrast to other interpretable methods that require designing specific models and training processes. Extensive empirical studies on a dozen datasets from various domains (e.g., birds, fishes, insects, fungi, flowers, food, and cars) validate the superior interpretation capability of Prompt-CAM. The source code and demo are available at https://github.com/Imageomics/Prompt_CAM.

Evolving Normalization-Activation Layers

Normalization layers and activation functions are fundamental components in deep networks and typically co-locate with each other. Here we propose to design them using an automated approach. Instead of designing them separately, we unify them into a single tensor-to-tensor computation graph, and evolve its structure starting from basic mathematical functions. Examples of such mathematical functions are addition, multiplication and statistical moments. The use of low-level mathematical functions, in contrast to the use of high-level modules in mainstream NAS, leads to a highly sparse and large search space which can be challenging for search methods. To address the challenge, we develop efficient rejection protocols to quickly filter out candidate layers that do not work well. We also use multi-objective evolution to optimize each layer's performance across many architectures to prevent overfitting. Our method leads to the discovery of EvoNorms, a set of new normalization-activation layers with novel, and sometimes surprising structures that go beyond existing design patterns. For example, some EvoNorms do not assume that normalization and activation functions must be applied sequentially, nor need to center the feature maps, nor require explicit activation functions. Our experiments show that EvoNorms work well on image classification models including ResNets, MobileNets and EfficientNets but also transfer well to Mask R-CNN with FPN/SpineNet for instance segmentation and to BigGAN for image synthesis, outperforming BatchNorm and GroupNorm based layers in many cases.

  • 4 authors
·
Apr 6, 2020

The Apache Point Observatory Galactic Evolution Experiment (APOGEE)

The Apache Point Observatory Galactic Evolution Experiment (APOGEE), one of the programs in the Sloan Digital Sky Survey III (SDSS-III), has now completed its systematic, homogeneous spectroscopic survey sampling all major populations of the Milky Way. After a three year observing campaign on the Sloan 2.5-m Telescope, APOGEE has collected a half million high resolution (R~22,500), high S/N (>100), infrared (1.51-1.70 microns) spectra for 146,000 stars, with time series information via repeat visits to most of these stars. This paper describes the motivations for the survey and its overall design---hardware, field placement, target selection, operations---and gives an overview of these aspects as well as the data reduction, analysis and products. An index is also given to the complement of technical papers that describe various critical survey components in detail. Finally, we discuss the achieved survey performance and illustrate the variety of potential uses of the data products by way of a number of science demonstrations, which span from time series analysis of stellar spectral variations and radial velocity variations from stellar companions, to spatial maps of kinematics, metallicity and abundance patterns across the Galaxy and as a function of age, to new views of the interstellar medium, the chemistry of star clusters, and the discovery of rare stellar species. As part of SDSS-III Data Release 12, all of the APOGEE data products are now publicly available.

  • 78 authors
·
Sep 17, 2015

Euclid Quick Data Release (Q1). Active galactic nuclei identification using diffusion-based inpainting of Euclid VIS images

Light emission from galaxies exhibit diverse brightness profiles, influenced by factors such as galaxy type, structural features and interactions with other galaxies. Elliptical galaxies feature more uniform light distributions, while spiral and irregular galaxies have complex, varied light profiles due to their structural heterogeneity and star-forming activity. In addition, galaxies with an active galactic nucleus (AGN) feature intense, concentrated emission from gas accretion around supermassive black holes, superimposed on regular galactic light, while quasi-stellar objects (QSO) are the extreme case of the AGN emission dominating the galaxy. The challenge of identifying AGN and QSO has been discussed many times in the literature, often requiring multi-wavelength observations. This paper introduces a novel approach to identify AGN and QSO from a single image. Diffusion models have been recently developed in the machine-learning literature to generate realistic-looking images of everyday objects. Utilising the spatial resolving power of the Euclid VIS images, we created a diffusion model trained on one million sources, without using any source pre-selection or labels. The model learns to reconstruct light distributions of normal galaxies, since the population is dominated by them. We condition the prediction of the central light distribution by masking the central few pixels of each source and reconstruct the light according to the diffusion model. We further use this prediction to identify sources that deviate from this profile by examining the reconstruction error of the few central pixels regenerated in each source's core. Our approach, solely using VIS imaging, features high completeness compared to traditional methods of AGN and QSO selection, including optical, near-infrared, mid-infrared, and X-rays.

  • 274 authors
·
Mar 19

Gaia Data Release 3: Summary of the content and survey properties

We present the third data release of the European Space Agency's Gaia mission, GDR3. The GDR3 catalogue is the outcome of the processing of raw data collected with the Gaia instruments during the first 34 months of the mission by the Gaia Data Processing and Analysis Consortium. The GDR3 catalogue contains the same source list, celestial positions, proper motions, parallaxes, and broad band photometry in the G, G_{BP}, and G_{RP} pass-bands already present in the Early Third Data Release. GDR3 introduces an impressive wealth of new data products. More than 33 million objects in the ranges G_{rvs} < 14 and 3100 <T_{eff} <14500 , have new determinations of their mean radial velocities based on data collected by Gaia. We provide G_{rvs} magnitudes for most sources with radial velocities, and a line broadening parameter is listed for a subset of these. Mean Gaia spectra are made available to the community. The GDR3 catalogue includes about 1 million mean spectra from the radial velocity spectrometer, and about 220 million low-resolution blue and red prism photometer BPRP mean spectra. The results of the analysis of epoch photometry are provided for some 10 million sources across 24 variability types. GDR3 includes astrophysical parameters and source class probabilities for about 470 million and 1500 million sources, respectively, including stars, galaxies, and quasars. Orbital elements and trend parameters are provided for some 800,000 astrometric, spectroscopic and eclipsing binaries. More than 150,000 Solar System objects, including new discoveries, with preliminary orbital solutions and individual epoch observations are part of this release. Reflectance spectra derived from the epoch BPRP spectral data are published for about 60\,000 asteroids. Finally, an additional data set is provided, namely the Gaia Andromeda Photometric Survey (abridged)

  • 456 authors
·
Jul 30, 2022

VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis

Despite tremendous progress in the field of text-to-video (T2V) synthesis, open-sourced T2V diffusion models struggle to generate longer videos with dynamically varying and evolving content. They tend to synthesize quasi-static videos, ignoring the necessary visual change-over-time implied in the text prompt. At the same time, scaling these models to enable longer, more dynamic video synthesis often remains computationally intractable. To address this challenge, we introduce the concept of Generative Temporal Nursing (GTN), where we aim to alter the generative process on the fly during inference to improve control over the temporal dynamics and enable generation of longer videos. We propose a method for GTN, dubbed VSTAR, which consists of two key ingredients: 1) Video Synopsis Prompting (VSP) - automatic generation of a video synopsis based on the original single prompt leveraging LLMs, which gives accurate textual guidance to different visual states of longer videos, and 2) Temporal Attention Regularization (TAR) - a regularization technique to refine the temporal attention units of the pre-trained T2V diffusion models, which enables control over the video dynamics. We experimentally showcase the superiority of the proposed approach in generating longer, visually appealing videos over existing open-sourced T2V models. We additionally analyze the temporal attention maps realized with and without VSTAR, demonstrating the importance of applying our method to mitigate neglect of the desired visual change over time.

  • 5 authors
·
Mar 20, 2024 3